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NUCLEIC ACIDS AND PROTEINS FROM CENARCHAEUM SYMBIOSUM 
Background of the Invention 

5 The identification and characterization of organisms which inhabit a diverse range of ecosystems leads to a 

greater understanding of the operation of such ecosystems. In addition, because the physiology of such organisms is 
adapted to function in the particular habitat which the organism inhabits, the enzymes which carry out the organism's 
physiological processes may possess characteristics which provide advantages when they are utilized in therapeutic 
procedures, industrial applications, or research applications. Furthermore, by determining the sequences of these 

10 organisms' genes, insight into their biochemical pathways and processes may be gained without the necessity of 

culturing the organisms in the laboratory, thereby enabling the physiological characterization of organisms which are 
recalcitrant to growth in the laboratory. 

Molecular phylogenetic surveys have recently revealed an ecologically widespread Crenarchaeal group that 
inhabits cold and temperate terrestrial and marine environments. To date these organisms have resisted isolation in 

15 pure culture, so their phenotypic and genotypic characteristics remain largely unknown. In order to characterize the 

physiology of these archaea, to develop methodological approaches for characterizing uncultivated microorganisms 
and identifying their presence in a sample, and to identify enzymes produced by these archae which may be useful in 
therapeutic, industrial, or laboratory applications, genomic analyses of the non-thermophilic crenarchaeote 
Cenarvhaeum symbiosum was undertaken. 

20 Non-thermophilic Crenarchaeota are one of the more abundant, widespread and frequently recovered 

prokaryotic groups revealed by molecular phylogenetic approaches. These microorganisms were originally detected in 
high abundance in temperate ocean waters and polar seas. (DeLong, E. F. 1992. Archaea in coastal marine 
environments. Proc. Natl. Acad. Sci. 89, 5685-5689; DeLong, E. F et al 1994. High abundance of Archaea in 
Antarctic marine picoplankton. Nature 371, 695-697; Fuhrman, J. A., et al. Davis. 1992. Novel major archaebacterial 

25 group from marine plankton. Nature 356, 148-149; Massana, R., et al 1997. Vertical distribution and phylogenetic 

characterization of marine planktonic Archaea in the Santa Barbara Channel. Appl. Env. Microb. 63, 50-56; 
Mclnerney, J.O. et al 1995. Recovery and phylogenetic analysis of novel erchaeat rRNA sequences from a deep-sea 
deposit feeder. Appl Env. Microb. 61, 1646-1648; Preston, C. M. etal, 1996. A psychrophilic cr en arch aeon inhabits a 
marine sponge: Cenarchaeum symbiosum gen. nov., sp. nov. Proc. Natl Acad. ScL USA 93, 6241-6246) 

30 Representatives have now been reported in terrestrial environments and freshwater lake sediments, indicating a 

widespread distribution. (Bmtrim, S.B. etal 1997. Molecular phyiogeny of Archaea from soil. Proc. Natl. Acad. Sci. 
USA 94, 277-282; Jurgens, G. et al 1997. Novel group within the kingdom Crenarchaeota from boreal forest soil. 
Appl. Env. Mircob. 63, 803-80515, Kudo, Y. et al 1997. Peculiar archaea found in Japanese paddy soils. Bio sc. 
Biotech. Biochem. 61, 917-920; Ueda, et al. 1995. Molecular phylogenetic analysis of a soil microbial community. 

35 Eur. J. Soil Sci. 46, 415-421; Hershberger, K. L et al 1996. Wide diversity of Crenarchaeota. Nature 384, 420; 
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MacGregor, B.J. 1997. Crenarchaeota in Lake Michigan sediment. Env. Microb. 63, 1178-1181 eta/.; Schleper, 
Zjet al. 1997. Recovery of crenarchaeotal ribosomal DNA sequences from freshwater-take sediments. AppL Env. 
Microb. 63, 321-323) The ecological distribution of these organisms was initially surprising, since their closest 
cultivated relatives are all thermophilic or hyperthermophilic. No representative of this new archaeal group has yet 
5 been obtained in pure culture, so the phenotypic and metabolic properties of these organisms, as well as their impact 

on the environment and global nutrient cycling, remain unknown. Since growth temperature and habitat 
characteristics vary so widely between non-thermophilic and the hyperthermophilic Creanarchaeota, these groups are 
likely to differ greatly with respect to their specific physiology and metabolism. 

To gain a better perspective on the genetic and physiological characteristics of non-thermophilic 

10 crenarchaeotes, a genomic study of Cenarchaeum symbiosum was begun. This archaeon lives in specific association 

with the marine sponge Amelia mexicana off the coast of California, allowing access to relatively large amounts of 
biomass from this species. (Preston, C. M. et al. 1996. A psychrophilic crenarchaeon inhabits a marine sponge: 
Cenarchaeum symbiosum gen. nov., sp. nov. Proc. Natl Acad. Set. USA 93, 6241-6246) The approach taken herein 
differs in several respects from now standard genomic characterization of cultivated organisms, and also from 

15 comparable studies of uncultivated obligate parasites or symbionts. C. symbiosum has not been completely physically 

separated from the tissues of its metazoan host. Therefore, its genetic material needs to be identified within the 
context of complex genomic libraries that contain significant amounts of eucaryotic DNA, as well as DNA derived from 
members of Bacteria. 

Molecular phylogenetic surveys of mixed microbial populations have revealed the existence of many new 

20 lineages undetected by classical microbiological approaches. (DeLong, E. F. 1997. Marine microbial diversity: the tip of 

the iceberg. Tibtech 15, 2-9.; Pace, N. R. 1997. A molecular view of microbial diversity and the biosphere. Science 
276, 734-740 I Furthermore, quantitative rRNA hybridization experiments demonstrate that some of these novel 
prokaryotic groups represent major components of natural microbial communities. These molecular phylogenetic 
approaches have altered current views of microbial diversity and ecology, and have demonstrated that traditional 

25 cultivation techniques may recover only a small, skewed fraction of naturally occurring microbes. However, 

phylogenetic identification using single gene sequences provides a limited perspective on other biological properties, 
particularly for novel lineages only distantly related to cultivated and characterized organisms. Consequently, 
additional approaches are necessary to better characterize ecologically abundant and potentially biotechnologicaOy 
useful microorganisms, many of which resist cultivation attempts. 

30 Summary of the Invention 

One embodiment of the present invention is an isolated, purified, or enriched nucleic acid comprising a 
sequence selected from the group consisting of SEQ ID NO: 1 and SEQ ID NO: 2, the sequences complementary to SEQ 
ID NO: 1 and SEQ ID NO: 2, fragments comprising at least 10 consecutive nucleotides of SEQ ID NO: 1 and SEQ ID 
NO: 2, and fragments comprising at least 10 consecutive nucleotides of the sequences complementary to SEQ 10 NO: 

35 1 and SEQ ID NO: 2. One aspect of the present invention is an isolated, purified, or enriched nucleic acid capable of 

2 
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hybridizing to the nucleic acid of this embodiment under conditions of high stringency. Another aspect of the present 
invention is an isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of this embodiment 
under conditions of moderate stringency. Another aspect of the present invention is an isolated, purified, or enriched 
nucleic acid capable of hybridizing to the nucleic acid of this embodiment under conditions of low stringency. Another 
5 aspect of the present invention is an isolated, purified, or enriched nucleic acid having at least 70% homology to the 

nucleic acid of this embodiment as determined by analysis with BLASTN version 2.0 with the default parameters. 
Another aspect of the present invention is an isolated, purified, or enriched nucleic acid having at least 99% homology 
to the nucleic acid of this embodiment as determined by analysis with BLASTN version 2.0 with the default 
parameters. 

10 Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid comprising a 

sequence selected from the group consisting of SEQ ID NOs: 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 
65, 67, 71, 75, 79 and the sequences complementary thereto. One aspect of the present invention is an isolated, 
purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of this embodiment under conditions of high 
stringency. Another aspect of the present invention is an isolated, purified, or enriched nucleic acid capable of 

15 hybridizing to the nucleic acid of this embodiment under conditions of moderate stringency. Another aspect of the 

present invention is an isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of this 
embodiment under conditions of low stringency. Another aspect of the present invention is an isolated, purified, or 
enriched nucleic acid having at least 70% homology to the nucleic acid of this embodiment as determined by analysis 
with BLASTN version 2.0 with the default parameters. Another aspect of the present invention is an isolated, 

20 purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of this embodiment as determined 

by analysis with BLASTN version 2.0 with the default parameters. 

Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid comprising at 
least 10 consecutive bases of a sequence selected from the group consisting of SEQ ID NOs: 5, 9, 13, 25, 27, 29, 31, 
33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79 and the sequences complementary thereto. One aspect of the 

25 present invention is an isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of 

this embodiment as determined by analysis with BLASTN version 2.0 with the default parameters. 

Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid comprising a 
sequence selected from the group consisting of SEQ ID NOs: 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 
55, 69, 73, 77 and the sequences complementary thereto. One aspect of the present invention is an isolated, purified, 

30 or enriched nucleic acid capable of hybridizing to the nucleic acid of this embodiment under conditions of high 

stringency. Another aspect of the present invention is an isolated, purified, or enriched nucleic acid capable of 
hybridizing to the nucleic acid of this embodiment under conditions of moderate stringency. Another aspect of the 
present invention is an isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of this 
embodiment under conditions of low stringency. Another aspect of the present invention is an isolated, purified, or 

35 enriched nucleic acid having at least 70% homology to the nucleic acid of this embodiment as determined by analysis 

3 
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with BLASTN version 2.0 with the default parameters. Another aspect of the present invention is an isolated, 
purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of this embodiment as determined 
by analysis with BLASTN version 2.0 with the default parameters. 

Another embodiment of the present invention is en isolated, purified, or enriched nucleic acid comprising at 

5 least 10 consecutive bases of a sequence selected from the group consisting of SEQ ID NOs: 3, 7, 1 1, 15, 17, 19, 21, 

21 35. 39, 43, 47. 49. 51, 53, 55, 69, 73, 77 and the sequences complementary thereto. One aspect of the present 
invention is an isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of this 
embodiment as determined by analysis with BLASTN version 2.0 with the default parameters. Another aspect of the 
present invention is an isolated, purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of 

10 this embodiment as determined by analysis with BLASTN version 2.0 with the default parameters. 

Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid encoding a 
polypeptide having a sequence selected from the group consisting of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 
42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80. 

Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid encoding a 

15 polypeptide comprising at least 10 consecutive amino acids of a polypeptide having a sequence selected from the 

group consisting of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 6B, 72, 76, and 80. 

Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid encoding a 
polypeptide having a sequence selected from the group consisting of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 
40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

20 Another embodiment of the present invention is an isolated, purified, or enriched nucleic acid encoding a 

polypeptide comprising at least 10 consecutive amino acids of a polypeptide having a sequence selected from the 
group consisting of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

Another embodiment of the present invention is an isolated or purified polypeptide comprising a sequence 
selected from the group consisting of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 

25 72, 76, and 80. Another aspect of the present invention is an isolated or purified polypeptide comprising at least 10 

consecutive amino acids of the polypeptides of this embodiment. Another aspect of the present invention is an 
isolated or purified polypeptide having at least 70% homology to the polypeptide of this embodiment as determined by 
analysis with FASTA version 3.0t78 with the default parameters. Another aspect of the present invention is an 
isolated or purified polypeptide having at least 99% homology to the polypeptide of this emobdiment as determined by 

30 analysis with FASTA version 3.0t78 with the default parameters. Another aspect of the present invention is an 

isolated or purified polypeptide having at least 70% homology to an isolated or purified polypeptide comprising at least 
10 consecutive amino acids of the polypeptides of this embodiment as determined by analysis with FASTA version 
3.0t78 with the default parameters. Another aspect of the present invention is an isolated or purified polypeptide 
having at least 99% homology to the polypeptide of to an isolated or purified polypeptide comprising at least 10 
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consecutive amino acids of the polypeptides of this embodiment as determined by analysis with FASTA version 3.0t78 

with the default parameters. 

Another aspect of the present invention is an isolated or purified polypeptide comprising a sequence selected 

from the group consisting of SEQ ID NOs: 4. 8, 12, 16 f IB, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 58, 70, 74, and 
5 78. One aspect of the present invention is an isolated or purified polypeptide comprising at least 10 consecutive 

amino acids of the polypeptides of this embodiment. Another aspect of the present invention is an isolated or purified 

polypeptide having at least 70% homology to the polypeptides of this embodiment as determined by analysis with 

FASTA version 3.0t78 with the default parameters. Another aspect of the present invention is an isolated or purified 

polypeptide having at least 99% homology to the polypeptides of this embodiment as determined by analysis with 
10 FASTA version 3.0t78 with the default parameters. Another aspect of the present invention is An isolated or purified 

polypeptide having at least 70% homology to an isolated or purified polypeptide comprising at least 10 consecutive 

amino acids of the polypeptides of this embodiment as determined by analysis with FASTA version 3.0t78 with the 

default parameters. Another aspect of the present invention is an isolated or purified polypeptide having at least 99% 

homology to an isolated or purified polypeptide comprising at least 10 consecutive amino acids of the polypeptides of 
1 5 this embodiment as determined by analysis with FASTA version 3.0t78 with the default parameters. 

Another embodiment of the present invention is an isolated or purified antibody capable of specifically 

binding to a polypeptide comprising a sequence selected from the group consisting of SEQ ID NOs: 6, 10, 14, 26, 28, 

30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80. 

Another embodiment of the present invention is an isolated or purified antibody capable of specifically 
20 binding to a polypeptide comprising at least 10 consecutive amino acids of one of the polypeptides of SEQ ID NOs: 6, 

10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72. 76, and 80. 

Another embodiment of the present invention is an isolated or purified antibody capable of specifically 

binding to a polypeptide having a sequence selected from the group consisting of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 

22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 
25 Another embodiment of the present invention is an isolated or purified antibody capable of specifically 

binding to a polypeptide comprising at least 10 consecutive amino acids of one of the polypeptides of SEQ ID NOs: 4, 

8, 12, 16, 18, 20, 22, 24, 36, 40, 44. 48, 50, 52, 54, 56, 70, 74, and 78. 

Another embodiment of the present invention is a method of making a polypeptide having a sequence 

selected from the group consisting of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 
30 72, 76, and 80 comprising introducing a nucleic acid encoding said polypeptide, said nucleic acid being operably linked 

to a promoter, into a host cell. 

Another embodiment of the present invention is a method of making a polypeptide comprising at least 10 

amino acids of a sequence selected from the group consisting of the sequences of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 

32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80 comprising introducing a nucleic acid encoding said 
35 polypeptide, said nucleic acid being operably linked to a promoter, into a host cell. 
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Another embodiment of the present invention is a method of making a polypeptide having a sequence 
selected from the group consisting of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 
74, and 78 comprising introducing a nucleic acid encoding said polypeptide, said nucleic acid being operably linked to a 
promoter, into a host cell. 

5 Another embodiment of the present invention is a method of making a polypeptide comprising at least 10 

amino acids of a sequence selected from the group consisting of the sequences of SEQ ID NOs: 4. 8, 12, 16, 18, 20, 
22, 24, 36, 40, 44, 48. 50, 52, 54, 56, 70, 74, and 78 comprising introducing a nucleic acid encoding said 
polypeptide, said nucleic acid being operably linked to a promoter, into a host ceil. 

Another embodiment of the present i method of generating a variant comprising obtaining a nucleic acid 

10 comprising a sequence selected from the group consisting of SEQ ID NQs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 

45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77, 
the sequences complementary to the sequences of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 
61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 , fragments 
comprising at least 30 consecutive nucleotides of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 

15 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77, and 

fragments comprising at least 30 consecutive nucleotides of the sequences complementary to SEQ ID NOS. 1, 2, 5, 9, 
13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 
49, 51, 53, 55, 69, 73 and 77 end changing one or more nucleotides in said sequence to another nucleotide, deleting 
one or more nucleotides in said sequence, or adding one or more nucleotides to said sequence. In one aspect of the 

20 present invention, the method further comprises the step of testing the enzymatic properties of a translation product 

of said variant. 

Another embodiment of the present invention is a computer readable medium having stored thereon a sequence 
selected from the group consisting of a nucleic acid code of SEQID NOs. 1, 2, 5, B, 13, 25, 27, 29, 31, 33, 37, 41, 45, 
57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 and a 

25 polypeptide code of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 86, 68, 72, 76, 80, 4, 8, 

12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

Another embodiment of the present invention is a computer system comprising a processor and a data storage 
device wherein said data storage device has stored thereon a sequence selected from the group consisting of a nucleic acid 
code of SEQID NOs. 1, 2. 5, 9. 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 

30 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 and a polypeptide code of SEQ ID NOs. 6, 10, 14, 26, 

28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18. 20, 22, 24, 36, 40, 44, 48, 50, 52, 
54, 56, 70, 74. and 78. In one aspect of the present invention, the computer system further comprises a sequence 
comparer and a data storage device having reference sequences stored thereon. For example, the sequence comparer may 
comprise a computer program which indicates polymorphisms. In another aspect of the present invention is the computer 

3 5 system of this embodiment further comprises an identifier which identifies features in said sequence. 

6 
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Another embodiment of the present invention is a method for comparing a first sequence to a reference sequence 
wherein said first sequence is selected from the group consisting of a nucleic acid code of SEQID NOs. 1, 2, 5, 9, 13, 25, 
27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 
53, 55, 69, 73 and 77 and a polypeptide code of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 48, 58, 60, 62, 
5 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 comprising the 

steps of reading said first sequence and said reference sequence through use of a computer program which compares 
sequences; and determining differences between said first sequence and said reference sequence with said computer 
program. In one aspect of the present invention, the step of determining differences between the first sequence and the 
reference sequence comprises identifying polymorphisms. 

10 Another embodiment of the present invention is a method for identifying a feature in a sequence selected from 

the group consisting of a nucleic acid code of SEQID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63. 
65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 and a polypeptide 
code of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 
20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 comprising the steps of reading said sequence through the 

1 5 use of a computer program which identifies features in sequences and identifying features in said sequence with said 

computer program. 

Brief Description of the Drawings 
Figure 1 shows the locations of coding regions, the %G-C. and the %DNA identity between the 
20 approximately 28Kb of common sequence in fosmids 101G10 and 60A5. 

Figure 2 shows the sequences surrounding the TATA boxes of several promoters from Cenarchaeum 
symhiosum and the distances from the TATA boxes to the initiation codons in these sequences. 
Figure 3 is a block diagram of an exemplary computer system. 

Figure 4 is a flow tfiagram illustrating one embodiment of a process 200 for comparing a new nucleotide or 
25 protein sequence with a database of sequences in order to determine the homology levels between the new sequence and 

the sequences in the database. 

Figure 5 is a flow diagram illustrating one embodiment of a process 250 in a computer for determining 
whether two sequences are homologous. 

Figure 6 is a flow diagram illustrating one embodiment of an identifier process for detecting the presence of 
30 a feature in a sequence. 

Definitions 

The term "gene" means the segment of DNA involved in producing a polypeptide chain; it includes regions 
preceding and following the coding region (leader and trailer) as well as, where applicable, intervening sequences 
(introns) between individual coding segments (exons). 
35 As used herein, the term "isolated" means that the material is removed from its original environment (e.g., 
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the natural environment if it is naturally occurring). For example, a naturally-occurring polynucleotide or polypeptide 
present in a living animal is not isolated, but the same polynucleotide or polypeptide, separated from some or all of the 
coexisting materials in the natural system, is isolated. Such polynucleotides could be part of a vector and/or such 
polynucleotides or polypeptides could be part of a composition, and still be isolated in that such vector or composition 
is not part of its natural environment. 

As used herein, the term "purified" does not require absolute purity; rather, it is intended as a relative definition. 
Individual nucleic acids obtained from a library have been conventionaDy purified to electrophoretic homogeneity. The 
sequences obtained from these clones could not be obtained directly either from the library or from total human DNA. The 
purified nucleic acids of the present invention have been purified from the remainder of the genomic DNA in the organism 
by at least 10M0 6 fold. However, the term "purified" also includes nucleic acids which have been purified from the 
remainder of the genomic DNA or from other sequences in a library or other environment by at least one order of 
magnitude, preferably two or three orders, and more preferably four or five orders of magnitude. 

As used herein, the term "recombinant" means that the nucleic acid is adjacent to "backbone" nucleic acid to 
which it is not adjacent in its natural environment. Additionally, to be "enriched" the nucleic acids will represent 5% or 
more of the number of nucleic acid inserts in a population of nucleic acid backbone molecules. Backbone molecules 
according to the present invention include nucleic acids such as expression vectors, self-replicating nucleic acids, viruses, 
integrating nucleic acids, and other vectors or nucleic acids used to maintain or manipulate a nucleic acid insert of interest 
Preferably, the enriched nucleic acids represent 15% or more of the number of nucleic acid inserts in the population of 
recombinant backbone molecules. More preferably, the enriched nucleic acids represent 50% or more of the number of 
nucleic acid inserts in the population of recombinant backbone molecules. In a highly preferred embodiment, the enriched 
nucleic acids represent 90ft or more of the number of nucleic acid inserts in the population of recombinant backbone 
molecules. 

A promoter sequence is "operably linked to" a coding sequence when RNA polymerase which initiates 
transcription at the promoter will transcribe the coding sequence into mRNA. 

"Recombinant" polypeptides or proteins refer to polypeptides or proteins produced by recombinant DNA 
techniques; i.e., produced from cells transformed by an exogenous DNA construct encoding the desired polypeptide or 
protein. "Synthetic" polypeptides or protein are those prepared by chemical synthesis. 

A DNA "coding sequence" or a "nucleotide sequence encoding" a particular polypeptide or protein, is a DNA 
sequence which is transcribed and translated into a polypeptide or protein when placed under the control of 
appropriate regulatory sequences. 

"Plasmids" are designated by a lower case p preceded and/or followed by capital letters and/or numbers. 
The starting plasmids herein are either commercially available, publicly available on an unrestricted basis, or can be 
constructed from available plasmids in accord with published procedures. In addition, equivalent plasmids to those 
described herein are known in the art and will be apparent to the ordinarily skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the DNA with a restriction enzyme that acts only at 
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certain sequences in the DNA. The various restriction enzymes used herein are commercially available and their 
reaction conditions, cofactors and other requirements were used as would be known to the ordinarily skilled artisan. 
For analytical purposes, typically 1 g of plasmid or DNA fragment is used with about 2 units of enzyme in about 20 I 
of buffer solution. For the purpose of isolating DNA fragments for plasmid construction, typically 5 to 50 g of DNA 
5 are digested with 20 to 250 units of enzyme in a larger volume. Appropriate buffers and substrate amounts for 

particular restriction enzymes are specified by the manufacturer. Incubation times of about 1 hour at 37 C are 
ordinarily used, but may vary in accordance with the supplier's instructions. After digestion the gel electrophoresis 
may be performed to isolate the desired fragment 

"Oligonucleotide" refers to either a single stranded polydeoxynucleotide or two complementary 
10 polydeoxynucleotide strands which may be chemically synthesized. Such synthetic oligonucleotides have no 5' 

phosphate and thus will not ligate to another oligonucleotide without adding a phosphate with an ATP in the presence 
of a kinase. A synthetic oligonucleotide will ligate to a fragment that has not been dephosphorylated. 

Detailed Description of the Preferred Embodiment 

15 In order to begin the characterization of Cenarchaeum symbiosum, a large region of the C. symbiosum 

genome was sequenced. In particular, two overlapping C. symbiosumimv&i fosmid inserts of approximately 42kb 
and 33kb were sequenced. The sequences of the two fosmid inserts revealed that there are at least two major 
variants or strains of C. symbiosum that coexist inside the sponge tissues of a single sponge. This complexity of the 
C. symbiosum population was not detected in initial studies based solely on direct sequencing of PCR amplified SSU 

20 genes. (Preston, C. M. et al 1996. A psychrophilic crenarchaeon inhabits a marine sponge: Cenarchaeum symbiosum 

gen. nov., sp. nov. Proc. Natl Acad. Sci. USA S3, 6241-6246) This natural variation would also have been lost upon 
isolation of a pure culture. 

The Cenarchaeum symbiosum sequences obtained from the two fosmids containing overlapping genomic 
inserts are provided in the accompanying sequence listing and are identified as SEQ ID NO: 1 and SEQ ID NO: 2. The 
25 two fosmid sequences were not entirely identical in their overlapping portions but instead contained differences. Upon 

further investigation, it was discovered that the two fosmid sequences were derived from two different, but closely 
related, strains of Cenarchaeum symbiosum (called variant A and variant B) which may simultaneously inhabit a single 
sponge. 

Within the sequences of the fosmid inserts, numerous open reading frames encoding polypeptides having 
30 homology to known proteins, as well as open reading frames encoding proteins which do not exhibit homology to 

known proteins, were identified. Homology was determined using the program FASTA with the default parameters. 
The polypeptides encoded by these sequences are identified in the accompanying sequence listing as SEQ ID NOs: 6, 
10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76 and 80 (polypeptides with homology to known 
proteins) and SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74 and 78 (polypeptides 
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without homology to known proteins). In addition, sequences encoding the 16S rRNA, the 23S rRNA and a tyrosine 

tRNAs were also identified. 

One aspect of the present invention is an isolated, purified, or enriched nucleic acid comprising one of the 

sequences of SEO ID NOs: 1. 2, 3. 5, 7, 9, 11, 13, 15. 17, 19, 21, 23. 25. 27, 29. 31, 33, 35. 37. 39. 41. 43. 45. 
5 47. 49, 51, 53, 55, 57. 59, 61, 63, 65, 67, 69, 71, 73, 75. 77 and 79 the sequences complementary thereto, or a 

fragment comprising at least 10, 15, 20, 25. 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 consecutive bases of 

one of the sequences of SEQ 10 NOs: 1, 2, 3. 5, 7. 9, 11. 13. 15, 17. 19, 21. 23. 25, 27. 29, 31. 33, 35. 37. 39. 41. 

43, 45, 47, 49, 51, 53. 55, 57, 59, 61, 63, 65. 67. 69. 71, 73, 75. 77 and 79 or the sequences complementary 

thereto. The isolated, purified or enriched nucleic acids may comprise DNA, including cONA, genomic DNA, and 
10 synthetic DNA. The DNA may be double-stranded or single-stranded, and if single stranded may be the coding strand 

or non-coding (anti-sense) str and. Alternatively, the isolated, purified or enriched nucleic acids may comprise RNA. 

As discussed in more detail below, the isolated, purified, or enriched nucleic acids of one of SEQ ID NOs: 1, 

2, 3. 5. 7, 9, 11. 13. 15, 17. 19. 21. 23, 25, 27. 29. 31, 33, 35. 37, 39. 41. 43. 45, 47, 49. 51. 53, 55, 57, 59. 61. 

63. 65, 67, 69, 71. 73. 75. 77 and 79 may be used to prepare one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 
15 12, 14. 16. 18, 20, 22. 24, 26. 28, 30. 32. 34, 36, 38, 40. 42, 44, 48, 50. 52. 54, 56, 58. 60. 62, 64, 66. 68, 70, 

72, 74 76, 78, and 80 or fragments comprising at least 5, 10. 15, 20, 25, 30, 35, 40. 50. 75, 100, or 150 consecutive 

amino acids of one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14. 16, 18. 20. 22, 24. 26. 28, 30, 32. 34, 

36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62. 64. 66, 68, 70. 72, 74 76. 78. and 80. 

Accordingly, another aspect of the present invention is an isolated, purified, or enriched nucleic acid which 
20 encodes one of the polypeptides of SEQ ID NOs: 4. 6, 8, 1 0. 1 2, 14, 1 6, 1 8, 20. 22, 24, 26. 28. 30, 32, 34, 36, 38. 

40. 42. 44. 48, 50. 52, 54. 56, 58. 60. 62. 64, 66, 68. 70, 72, 74 76. 78, and 80 or fragments comprising at least 5. 

10, 15, 20, 25, 30, 35. 40, 50, 75, 100, or 150 consecutive amino acids of one of the polypeptides of SEQ ID NOs: 4, 

6, 8. 10, 12. 14. 16, 18. 20. 22. 24. 26. 28. 30. 32. 34, 36, 38. 40. 42, 44. 48. 50. 52, 54. 56. 58. 60, 62. 64. 66. 

68, 70. 72. 74 76, 78, and 80. The coding sequences of these nucleic acids may be identical to one of the coding 
25 sequences of one of the nucleic acids of SEQ ID NOs: 1, 2, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 

35, 37, 39, 41, 43, 45, 47. 49, 51, 53, 55, 57, 59, 61. 63, 65. 67, 69, 71, 73, 75, 77 and 79 or a fragment thereof 

or may be different coding sequences which encode one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16. 

18, 20, 22, 24. 26, 28. 30. 32. 34, 38, 38. 40, 42. 44, 48, 50. 52. 54, 56, 58. 60. 62, 64, 66. 68. 70, 72. 74 76. 

78, and 80 or fragments comprising at least 5. 10. 15, 20, 25. 30, 35, 40. 50, 75, 100, or 150 consecutive amino 
30 acids of one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10. 12. 14. 16, 18, 20, 22, 24, 26, 28, 30. 32, 34, 36, 38. 

40, 42, 44, 48, 50. 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 as a result of the redundancy or 

degeneracy of the genetic code. The genetic code is well known to those of skill in the art and can be obtained, for 

example, on page 214 of B. Lewin, Genes VI, Oxford University Press. 1997. 

The isolated, purified, or enriched nucleic acid which encodes one of the polypeptides of SEQ ID NOs: 4, 6, 8, 
35 10, 12, 14. 1 6, 18, 20, 22, 24, 26, 28. 30, 32, 34, 36, 38. 40. 42, 44, 48, 50. 52, 54. 56, 58, 60. 62. 64, 66. 68. 
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70, 72, 74 76, 78, and 80 may include, but is not limited to: only the coding sequence of one of SEQ ID NOs: 1, 2, 3, 
5, 7, 9, 11.13. 15, 17. 19. 21. 23, 25. 27, 29, 31, 33. 35, 37, 39, 41, 43, 45, 47, 49. 51, 53, 55, 57, 59, 61, 63. 
65, 67. 69, 71, 73, 75, 77 and 79; the coding sequences of SEQ ID NOs: 1. 2, 3, 5, 7, 9, 11, 13. 15, 17, 19, 21, 23, 
25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 and 79 
5 and additional coding sequences, such as leader sequences or proprotein sequences; or the coding sequences of SEQ 

ID NOs: 1, 2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45. 47, 49, 51, 53, 55, 
57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 and 79 and non-coding sequences, such as introns or non-coding 
sequences 5' and/or 3' of the coding sequence. Thus, as used herein, the term "polynucleotide encoding a 
polypeptide" encompasses a polynucleotide which includes only coding sequence for the polypeptide as well as a 

10 polynucleotide which includes additional coding and/or non-coding sequence. 

Alternatively, the nucleic acid sequences of SEQ ID NOs: 1, 2, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 25, 27, 
29, 31, 33, 35, 37, 39, 41, 43, 45. 47, 49, 51, 53, 55, 57, 59, 61, 63. 65, 67. 69. 71, 73, 75, 77 and 79 may be 
mutagenized using conventional techniques, such as site directed mutagenesis, or other techniques familiar to those 
skilled in the art, to introduce silent changes into the polynucleotides of SEQ ID NOs: 1, 2, 3, 5, 7, 9, 1 1, 13, 15, 17, 

15 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43. 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 

77 and 79. As used herein, "silent changes" include, for example, changes which do not alter the amino acid sequence 
encoded by the polynucleotide. Such changes may be desirable in order to increase the level of the polypeptide 
produced by host cells containing a vector encoding the polypeptide by introducing codons or codon pairs which occur 
frequently in the host organism. 

20 The present invention also relates to polynucleotides which have nucleotide changes which result in amino 

acid substitutions, additions, deletions, fusions and truncations in the polypeptides of SEQ ID NOs: 4, 6, 6, 10, 12, 14, 
16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40. 42, 44, 48, 50, 52, 54, 56, 58, 60, 62. 64, 66, 68, 70, 72, 74 
76, 78, and 80. Such nucleotide changes may be introduced using techniques such as site directed mutagenesis, 
random chemical mutagenesis, exonuclease III deletion, and other recombinant DNA techniques. Alternatively, such 

25 nucleotide changes may be naturally occurring allelic variants which are isolated by identifying nucleic acids which 

specifically hybridize to probes comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 
consecutive bases of one of the sequences of SEQ ID NOs: 1, 2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 
31, 33, 35, 37, 39, 41, 43, 45, 47, 49. 51, 53, 55, 57, 59, 61, 63, 65. 67, 69, 71, 73, 75, 77 and 79 or the 
sequences complementary thereto to nucleic acids from Cenarchaeum symbiosum or related organisms under 

30 conditions of high, moderate, or low strigency as provided herein. 

The isolated, purified, or enriched nucleic acids of SEQ ID NOs: 1, 2, 3, 5. 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 
25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 and 79, 
the sequences complementary thereto, or a fragment comprising at least 10, 15, 20. 25, 30, 35, 40, 50, 75, 100, 150, 
200, 300,400, or 500 consecutive bases of one of the sequences of SEQ ID NOs: 1,2, 3, 5. 7. 9, 11, 13, 15, 17, 19, 

35 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 
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and 79 or the sequences complementary thereto may also be used as probes to identify the presence of Cenarchaeum 
symbiosum in a biological sample. In such procedures, a biological sample potentially harboring Cenarchaeum 
symbiosum is obtained and nucleic acids are obtained from the sample. The nucleic acids are contacted with the 
probe under conditions which permit the probe to specifically hybridize to any complementary sequences from 
5 Cenarchaeum symbiosum which are present therein. 

Where necessary, conditions which permit the probe to specifically hybridize to complementary sequences 
from Cenarchaeum symbiosum may be determined by placing the probe in contact with complementary sequences 
from Cenarchaeum symbiosum as well as control sequences which are not from Cenarchaeum symbiosum. In some 
analyses, the control sequences may be from organisms related to Cenarchaeum symbiosum. Alternatively, the 

10 control sequences may be from organisms which are not related to Cenarchaeum symbiosum. Hybridization 

conditions, such as the salt concentration of the hybridization buffer, the formamide concentration of the hybridization 
buffer, or the hybridization temperature, may be varied to identify conditions which allow the probe to hybridize 
specifically to nucleic acids from Cenarchaeum symbiosum. 

If the sample contains nucleic acids from Cenarchaeum symbiosum, specific hybridization of the probe to the 

IS nucleic acids from Cenarchaeum symbiosum is then detected. Hybridization may be detected by labeling the probe 

with a detectable agent such as a radioactive isotope, a fluorescent dye or an enzyme capable of catalyzing the 
formation of a detectable product. 

Many methods for using the labeled probes to detect the presence of nucleic acids from Cenarchaeum 
symbiosum in a sample are familiar to those skilled in the art. These include Southern Blots, Northern Blots, colony 

20 hybridization procedures, and dot blots. Protocols for each of these procedures are provided in Ausubel et at. Current 

Protocols in Molecular Biology, John Wiley 503 Sons, Inc. 1997 and Sambrook et al., Molecular Cloning: A Laboratory 
Manual 2d Ed., Cold Spring Harbor Laboratory Press, 19B9. 

Alternatively, more than one probe (at least one of which is capable of specifically hybridizing to any 
complementary sequences from Cenarchaeum symbiosum which are present in the nucleic acid sample), may be used 

25 in an amplification reaction to determine whether the nucleic acid sample contains nucleic acids from Cenarchaeum 

symbiosum. Preferably, the probes comprise oligonucleotides. In one embodiment, the amplification reaction may 
comprise a PCR reaction. PCR protocols are described in Ausubel and Sambrook, supra. Alternatively, the 
amplification may comprise a ligase chain reaction, 3SR, or strand displacement reaction. (See Barany, F., "The Ligase 
Chain Reaction in a PCR World", PCR Methods and Applications 1:5-16 (1991); E. Fahy eta/. t "Self-sustained Sequence 

30 Replication (3SR): An Isothermal Transcription-based Amplification System Alternative to PCR", PCR Methods and 

Applications 1:25-33 (1991); and Walker G.T. et a!., "Strand Displacement Amplification-an Isothermal in vitro DNA 
Amplification Technique, Nucleic Acid Research 20:1691-1696 (1992). In such procedures, the nucleic acids in the sample 
are contacted with the probes, the amplification reaction is performed, and any resulting amplification product is detected. 
The amplification product may be detected by performing gel electrophoresis on the reaction products and staining the gel 

35 with an interculator such as ethidium bromide. Alternatively, one or more of the probes may be labeled with a radioactive 
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isotope and the presence of a ratfioactive amplification product may be detected by autoradiography after gel 
electrophoresis. 

Probes derived from sequences near the ends of the sequences of SEQ ID Nos: 1 and 2 may also be used in 
chromosome walking procedures to identify clones containing genomic sequences located adjacent to the sequences of 
5 SEQ ID Nos: 1 and 2. Such methods allow the isolation of genes which encode additional proteins expressed in 

Cenarchaeum symbiosum and facilitate the further physiological characterization of the organism. 

Another aspect of the present invention is a method for determining whether a sample contains variant A 
and/or variant B of Cenarchaeum symbiosum. In such procedures, a sample potentially harboring variant A and/or 
variant B Cenarchaeum symbiosum is obtained and nucleic acids are obtained from the sample. The nucleic acids are 

10 contacted with the probe under conditions which permit the probe to specifically hybridize to any complementary 

sequences from variant A or variant B of Cenarchaeum symbiosum which are present therein. Preferably, the probe 
comprises a sequence having one or more nucleotides which differ between variant A and variant B. Conditions in 
which the probe specifically hybridizes to nucleic acids from one of the variants but not to nucleic acids from the other 
variant may be determined by contacting the probe with its corresponding sequence from variant A and variant B and 

15 varying the hybridization conditions, such as the salt concentration of the hybridization buffer, the formamide 

concentration of the buffer, or the hybridization temperature, to identify conditions in which the probe hybridizes to 
the corresponding sequence from one variant but not to the corresponding sequence from the other variant. 
Hybridization of the probe to nucleic acids from the Cenarchaeum symbiosum variant is then detected using any of the 
procedures described above. 

20 The isolated, purified, or enriched nucleic acids of SEQ ID NOs: 1, 2, 3. 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 

25, 27, 29, 31, 33, 35, 37, 39, 41. 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 and 79, 
the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 
200, 300, 400, or 500 consecutive bases of one of the sequences of SEQ ID NOs: 1, 2, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 
21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 

25 and 79 or the sequences complementary thereto may be used as probes to identify and isolate cDNAs encoding the 

polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 2B, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 
52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80. In such procedures, a cDNA library is constructed 
from a sample containing Cenarchaeum symbiosum. The cDNA library is then contacted with a probe comprising a 
coding sequence, or a fragment of a coding sequence, encoding one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 

30 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52. 54, 56, 58, 60, 62, 64, 66, 68, 70, 

72, 74 76, 78, and 80 or a fragment thereof under conditions which permit the probe to specifically hybridize to 
sequences complementary thereto. cDNAs which hybridize to the probe are then detected and isolated. Procedures 
for preparing and identifying cDNAs are disclosed in Ausubel et al. Current Protocols in Molecular Biology, John Wiley < 
503 Sons, Inc. 1997 and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory 

35 Press, 1989. 
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The isolated, purified, or enriched nucleic acids of SEQ ID NOs: 1, 2, 3, 5, 7, 9, 1 1, 13, 15, 17, 19, 21, 23, 
25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73. 75, 77 and 79, 
the sequences complementary thereto, or a fragment comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 
200, 300, 400, or 500 consecutive bases of one of the sequences of SEQ 10 NOs: 1, 2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 
21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 
and 79 or the sequences complementary thereto may be used as probes to identify and isolate related nucleic acids. 
In some embodiments, the related nucleic acids may be cDNAs or genomic DNAs from organisms other than 
Cenarchaeum symbiosum. For example, the other organisms may be organisms which are related to Cenarchaeum 
symbiosum. In such procedures, a nucleic acid sample containing nucleic acids from the related organism, such as a 
cDNA or genomic DNA library from the related organism, is contacted with the probe under conditions which permit 
the probe to specifically hybridize to related sequences. Hybridization of the probe to nucleic acids from the related 
organism is then detected using any of the methods described above. 

Hybridization may be carried out under conditions of low stringency, moderate stringency or high stringency. 
As an example of nucleic acid hybridization, a polymer membrane containing immobilized denatured nucleic acids is 
first prehybridized for 30 minutes at 45 C in a solution consisting of 0.9 M NaCI, 50 mM NaH 2 P0 4 , pH 7.0, 5.0 mM 
Na 2 EDTA, 0.5% SDS, 10X Denhardt's, and 0.5 mgfml polyriboadenylic acid. Approximately 2 X 10 7 cpm (specific 
activity 4-9 X 10 B cpm/ug) of 32 P end-labeled oligonucleotide probe are then added to the solution. After 12*16 hours 
of incubation, the membrane is washed for 30 minutes at room temperature in 1X SET (150 mM NaCI, 20 mM Tris 
hydrochloride, pH 7.B, 1 mM Na 2 EDTA) containing 0.5% SDS, followed by a 30 minute wash in fresh 1X SET at Tm- 
10 C for the oligonucleotide probe. The membrane is then exposed to auto radiographic film for detection of 
hybridization signals. 

By varying the stringency of the hybridization conditions used to identify nucleic acids, such as cDNAs or 
genomic DNAs, which hybridize to the detectable probe, nucleic acids having different levels of homology to the probe can 
be identified and isolated. Stringency may be varied by conducting the hybridization at varying temperatures below the 
melting temperatures of the probes. The melting temperature of the probe may be calculated using the following formulas: 

For probes between 14 and 70 nucleotides in length the melting temperature (Tin) is calculated using the 
formula: Tm- 8 1.5+ 16. 6 (log [Na+])+0.41(fraction G+CM6Q0/N) where N is the length of the probe. 

If the hybridization is carried out in a solution containing formamide, the melting temperature may be calculated 
using the equation Tm«81.5+16.6(log I N a +])+ 0.41 (fraction G+C)(0.63% formamideM600/N) where N is the length of 
the probe. 

Prehybridization may be carried out in 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 100 g denatured fragmented 
salmon sperm DNA or 6X SSC, 5X Denhardt's reagent, 0.5% SDS, 100 g denatured fragmented salmon sperm DNA, 50% 
formamide. The formulas for SSC and Denhardt's solutions are listed in Sambrook et al., supra. 

Hybridization is conducted by adding the detectable probe to the prehybridization solutions listed above. Where 
the probe comprises double stranded DNA, it is denatured before addition to the hybridization solution. The filter is 
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contacted with the hybridization solution for a sufficient period of time to allow the probe to hybridize to cDNAs or genomic 
DNAs containing sequences complementary thereto or homologous thereto. For probes over 200 nucleotides in length, the 
hybridization may be carried out at 15-25 C below the Tm. For shorter probes, such as oligonucleotide probes, the 
hybridization may be conducted at 5-10 C below the Tm. Preferably, for hybridizations in 6X SSC, the hybridization is 
conducted at approximately 6B C. Preferably, for hybridizations in 50% formamide containing solutions, the hybridization 
is conducted at approximately 42 C. 

All of the foregoing hybridizations would be considered to be under conditions of high stringency. 

Following hybridization, the filter is washed in 2X SSC, 0.1% SOS at room temperature for 15 minutes. The 
filter is then washed with 0.1 X SSC, 0.5% SDS at room temperature for 30 minutes to 1 hour. Thereafter, the solution is 
washed at the hybridization temperature in 0.1X SSC, 0.5% SDS. A final wash is conducted in 0.1X SSC at room 
temperature. 

Nucleic acids which have hybridized to the probe are identified by autoradiography or other conventional 
techniques. 

The above procedure may be modified to identify nucleic acids having decreasing levels of homology to the probe 
sequence. For example, to obtain nucleic acids of decreasing homology to the detectable probe, less stringent conditions 
may be used. For example, the hybridization temperature may be decreased in increments of 5 C from 68 C to 42 C in a 
hybridization buffer having a Na+ concentration of approximately 1M. Following hybridization, the filter may be washed 
with 2X SSC, 0.5% SDS at the temperature of hybridization. These conditions are considered to be "moderate" conditions 
above 50 C and "low" conditions below 50 C. A specific example of "moderate" hybridization conditions is when the 
above hybridization is conducted at 55 C. A specific example of low stringency" hybridization conditions is when the 
above hybridization is conducted at 45 C. 

Alternatively, the hybridization may be carried out in buffers, such as 6X SSC, containing formamide at a 
temperature of 42 C. In this case, the concentration of formamide in the hybridization buffer may be reduced in 5% 
increments from 50% to 0% to identify clones having decreasing levels of homology to the probe. Following hybridization, 
the filter may be washed with 6X SSC, 0.5% SDS at 50 C. These conditions are considered to be "moderate" conditions 
above 25% formamide and "low" conditions below 25% formamide. A specific example of "moderate 1 * hybridization 
conditions is when the above hybridization is conducted at 30% formamide. A specific example of low stringency" 
hybridization conditions is when the above hybridization is conducted at 10% formamide. 

Nucleic acids which have hybridized to the probe are identified by autoradiography. 

For example, the preceding methods may be used to isolate nucleic acids having a sequence with at least 
97%, at least 95%, at least 90%, at least 85%, at least 80%, or at least 70% homology to a nucleic acid sequence 
selected from the group consisting of one of the sequences of SEO ID NOS. 1, 2, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 
23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77 and 
79, fragments comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 40D, or 500 consecutive 
bases thereof, and the sequences complementary thereto. Homology may be measured using BLASTN version 2.0 
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with the default parameters. For example, the homologous polynucleotides may have a coding sequence which is a 
naturally occurring allelic variant of one of the coding sequences described herein. Such allelic variants may have a 
substitution, deletion or addition of one or more nucleotides when compared to the nucleic acids of SEQ ID NOs: 1. 2, 
3. 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 

5 63, 65, 67, 69, 71, 73, 75, 77 and 79 or the sequences complementary thereto. 

Additionally, the above procedures may be used to isolate nucleic acids which encode polypeptides having at 
least 99%, 95%, at least 90%, at least 85%, at least 80%, or at least 70% homology to a polypeptide having the 
sequence of one of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 
50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or fragments comprising at least 5, 10, 15, 20, 25, 

10 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof as determined using the FASTA version 3.0t7B 

algorithm with the default parameters. 

Another aspect of the present invention is an isolated or purified polypeptide comprising the sequence of one 
of SEQ ID NOs: 4, 6, 8, 10, 12. 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 
58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 

1 5 75, 100, or 15D consecutive amino acids thereof. As discussed above, such polypeptides may be obtained by inserting 

a nucleic acid encoding the polypeptide into a vector such that the coding sequence is operably linked to a sequence 
capable of driving the expression of the encoded polypeptide in a suitable host cell. For example, the expression 
vector may comprise a promoter, a ribosome binding site for translation initiation and a transcription terminator. The 
vector may also include appropriate sequences for amplifying expression. 

20 Promoters suitable for expressing the polypeptide or fragment thereof in bacteria include the Li°!L or 

trg promoters, the lad promoter, the lacZ promoter, the T3 promoter, the T7 promoter, the gpt promoter, the lambda 
P R promoter, the lambda P t promoter the trp promoter, promoters from operons encoding glycolytic enzymes such as 3- 
phosphogiycerate kinase (PGK), and the acid phosphatase promoter. Fungal promoters include the a factor promoter. 
Eukaryotic promoters include the CMV immediate early promoter, the HSV thymidine kinase promoter, heat shock 

25 promoters, the early and late SV40 promoter, LTRs from retroviruses, and the mouse metallothionein-l promoter. 

Other promoters known to control expression of genes in prokaryotic or eukaryotic cells or their viruses may also be 
used. 

Mammalian expression vectors may also comprise an origin of replication, any necessery ribosome binding 
sites, a polyadenylation site, splice donor and acceptor sites, transcriptional termination sequences, and 5' flanking 
30 nontranscribed sequences. In some embodiments, DNA sequences derived from the SV40 splice and polyadenylation 

sites may be used to provide the required nontranscribed genetic elements. 

Vectors for expressing the polypeptide or fragment thereof in eukaryotic cells may also contain enhancers to 
increase expression levels. Enhancers are cis-acting elements of DNA, usually from about 10 to about 300 bp in 
length that act on a promoter to increase its transcription. Examples include the SV40 enhancer on the late side of the 

16 



WO 00/18909 PCT/US99/22752 

replication origin bp 100 to 270, the cytomegalovirus early promoter enhancer, the polyoma enhancer on the late side 
of the replication origin, and the adenovirus enhancers. 

In addition, the expression vectors preferably contain one or more selectable marker genes to permit 
selection of host cells containing the vector. Such selectable markers include genes encoding dihydrofolate reductase 
5 or genes conferring neomycin resistance for eukaryotic cell culture, genes conferring tetracycline or ampicHfin 

resistance in E. coli, and the S. cerevisiae TRP1 gene. 

In some embodiments, the nucleic acid encoding one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 
16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 
76, 78, and 80 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino 

10 acids thereof is assembled in appropriate phase with a leader sequence capable of directing secretion of the translated 

polypeptide or fragment thereof. Optionally, the nucleic acid can encode a fusion polypeptide in which one of the 
polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16 f 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 
52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or fragments comprising at least 5, 10, 15, 20. 25, 30, 
35, 40, 50, 75, 100, or 150 consecutive amino acids thereof is fused to heterologous peptides or polypeptides, such as 

15 N-terminal identification peptides which impart desired characteristics, such as increased stability or simplified 

purification. 

The appropriate DNA sequence may be inserted into the vector by a variety of procedures. In general, the 
DNA sequence is ligated to the desired position in the vector following digestion of the insert and the vector with 
appropriate restriction endonucleases. Alternatively, blunt ends in both the insert and the vector may be tigated. A 

20 variety of cloning techniques are disclosed in Ausubel et a!. Current Protocols in Molecular Biology, John Wiley 503 

Sons, Inc. 1997 and Sambrook et al., Molecular Cloning: A Laboratory Manual 2d Ed., Cold Spring Harbor Laboratory 
Press, 1989. Such procedures and others are deemed to be within the scope of those skilled in the art. 

The vector may be, for example, in the form of a plasmid, a viral particle, or a phage. Other vectors include 
chromosomal, nonchromosomal and synthetic DNA sequences, derivatives of SV40; bacterial plasmids, phage DNA, 

25 baculovirus, yeast plasmids, vectors derived from combinations of plasmids and phage DNA, viral DNA such as 

vaccinia, adenovirus, fowl pox virus, and pseudorabies. A variety of cloning and expression vectors for use with 
prokaryotic and eukaryotic hosts are described by Sambrook, et al.. Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor, N.Y., (1989). 

Particular bacterial vectors which may be used include the commercially available plasmids comprising 

30 genetic elements of the well known cloning vector pBR322 (ATCC 37017), pKK223-3 (Pharmacia Fine Chemicals, 

Uppsala, Sweden), GEM1 (Promega Biotec, Madison, Wl, USA) pQE70 t pQE60, pQE-9 (Qiagen), pD10, psiX174 
pBluescript II KS, pNHBA, pNH16a, pNH18A, pNH46A (Stratagene), ptrc99a, pKK223-3, pKK2333, pDR540, pRIT5 
(Pharmacia), pKK232-8 and pCM7. Particular eukaryotic vectors include pSV2CAT, p0G44, pXT1, pSG (Stratagene) 
pSVK3, pBPV, pMSG, and pSVL (Pharmacia). However, any other vector may be used as long as it is replicable and 

35 viable in the host cell. 
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The host cell may be any of the host cells familiar to those skilled in the art, including prokaryotic cells, 
eukaryotic cells, mammalian cells, insect cells, or plant cells. As representative examples of appropriate hosts, there 
may be mentioned: bacterial cells, such as LcoH, Strentomvces, BacjHus subtilis, Salmonella typhimurium and various 
species within the genera Pseudomonas, Streptomyces, and Staphylococcus, fungal cells, such as yeast, insect cells 

5 such as Drosoohila S2 and Snodontera SJ9. animal cells such as CHO, COS or Bowes melanoma, and adenoviruses. 

The selection of an appropriate host is within the abilities of those skilled in the art. 

The vector may be introduced into the host cells using any of a variety of techniques, including 
transformation, transfection, transduction, viral infection, gene guns, or Ti-mediated gene transfer. Particular 
methods include calcium phosphate transfection. DEAEDextran mediated transfection. lipofection, or electroporation 

1 0 (Davis. L, Dibner, M., Battey, I., Basic Methods In Molecular Biology, (19861). 

Where appropriate, the engineered host cells can be cultured in conventional nutrient media modified as 
appropriate for activating promoters, selecting transformants or amplifying the genes of the present invention. 
Following transformation of a suitable host strain and growth of the host strain to an appropriate cell density, the 
selected promoter may be induced by appropriate means (e.g., temperature shift or chemical induction) and the cells 

1 5 may be cultured for an additional period to allow them to produce the desired polypeptide or fragment thereof. 

Cells are typically harvested by centrifugation. disrupted by physical or chemical means, and the resulting 
crude extract is retained for further purification. Microbial cells employed for expression of proteins can be disrupted 
by any convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or use of cell lysing 
agents. Such methods are well known to those skilled in the art. The expressed polypeptide or fragment thereof can 

20 be recovered and purified from recombinant cell cultures by methods including ammonium sulfate or ethanol 
precipitation, acid extraction, anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography and lectin 
chromatography. Protein refolding steps can be used, as necessary, in completing configuration of the polypeptide. If 
desired, high performance liquid chromatography (HPLC) can be employed for final purification steps. 

25 Various mammalian cell culture systems can also be employed to express recombinant protein. Examples of 

mammalian expression systems include the COS-7 lines of monkey kidney fibroblasts (described by Gluzman. Cell. 
23:175 (1981), and other cell lines capable of expressing proteins from a compatible vector, such as the C127, 3T3. 

CHO, HeLa and BHK cell Ones. 

The constructs in host cells can be used in a conventional manner to produce the gene product encoded by 
30 the recombinant sequence. Depending upon the host employed in a recombinant production procedure, the 

polypeptides produced by host cells containing the vector may be glycosylated or may be non-glycosylated. 
Polypeptides of the invention may or may not also include an initial methionine amino acid residue. 

Alternatively, the polypeptides of SEQ ID NOs: 4, 6, 8, 10. 12. 14, 16. 18. 20, 22, 24, 26, 28, 30, 32, 34, 
36. 38, 40. 42. 44, 48, 50. 52, 54. 56, 58. 60. 62. 64, 66. 68. 70, 72, 74 76. 78, and 80 or fragments comprising at 
35 least 5, 10. 15, 20, 25, 30. 35. 40, 50, 75, 100, or 150 consecutive amino acids thereof can be synthetically produced 
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by conventional peptide synthesizers. In other embodiments, fragments or portions of the polypeptides may be 
employed for producing the corresponding full-length polypeptide by peptide synthesis; therefore, the fragments may 
be employed as intermediates for producing the full-length polypeptides. 

Cell-free translation systems can also be employed to produce one of the polypeptides of SEQ ID Nos: 4, 6, 
5 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 

68, 70, 72, 74 76, 78, and 80 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 
consecutive amino acids thereof using mRNAs transcribed from a DNA construct comprising a promoter operably 
linked to a nucleic acid encoding the polypeptide or fragment thereof. In some embodiments, the DNA construct may 
be linearized prior to conducting an in vitro transcription reaction. The transcribed mRNA is then incubated with an 
10 appropriate cell-free translation extract, such as a rabbit reticulocyte extract, to produce the desired polypeptide or 

fragment thereof. 

The present invention also relates to variants of the polypeptides of SEO ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 
20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, 
and 80 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids 

15 thereof. The term "variant" includes derivatives or analogs of these polypeptides. In particular, the variants may 

differ in amino acid sequence from the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 
30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 by one or more 
substitutions, additions, deletions, fusions and truncations, which may be present in any combination. 

The variants may be naturally occurring or created in vitro. In particular, such variants may be created using 

20 genetic engineering techniques such as site directed mutagenesis, random chemical mutagenesis, Exonuclease III 

deletion procedures, and standard cloning techniques. Alternatively, such variants, fragments, analogs, or derivatives 
may be created using chemical synthesis or modification procedures. 

Other methods of making variants are also familiar to those skilled in the art. These include procedures in 
which nucleic acid sequences obtained from natural isolates are modified to generate nucleic acids which encode 

25 polypeptides having characteristics which enhance their value in industrial or laboratory applications. In such 

procedures, a large number of variant sequences having one or more nucleotide differences with respect to the 
sequence obtained from the natural isolate are generated and characterized. Preferably, these nucleotide differences 
result in amino acid changes with respect to the polypeptides encoded by the nucleic acids from the natural isolates. 

For example, variants may be created using error prone PCR. In error prone PGR, PCR is performed under 

30 conditions where the copying fidelity of the DNA polymerase is low, such that a high rate of point mutations is 

obtained along the entire length of the PCR product. Error prone PCR is described in Leung, D.W., et */., Technique, 
1:11-15 (19 89) and Caldwell, R. C. & Joyce G.F., PCR Methods Applic, 2:28-33 (1992). Briefly, in such procedures, 
nucleic acids to be mutagenized are mixed with PCR primers, reaction buffer, MgCI 2 , MnCI* Taq polymerase and an 
appropriate concentration of dNTPs for achieving a high rate of point mutation along the entire length of the PCR 

35 product. For example, the reaction may be performed using 20 fmoles of nucleic acid to be mutagenized, 30pmole of 
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each PCR primer, a reaction buffer comprising 50mM KCI, 10mM Tris HCI (pH 8.3) and 0.01% gelatin, 7mM MgCI 2 , 
0.5mM MnCI 2 , 5 units of Taq polymerase, 0.2mM dGTP, 0.2mM dATP, ImM dCTP, and ImM dTTP. PCR may be 
performed for 30 cycles of 94° C for 1 min, 45° C for 1 min, and 72° C for 1 min. However, it will be appreciated 
that these parameters may be varied as appropriate. The mutagenized nucleic acids are cloned into an appropriate 
5 vector and the activities of the polypeptides encoded by the mutagenized nucleic acids is evaluated. 

Variants may also be created using oligonucleotide directed mutagenesis to generate site-specific mutations 
in any cloned DNA segment of interest. Oligonucleotide mutagenesis is described in Reidhaar-Olson, J.F. & Sauer, 
R.T., et al, Science, 241:53*57 {1988). Briefly, in such procedures a plurality of double stranded oligonucleotides 
bearing one or more mutations to be introduced into the cloned DNA are synthesized and inserted into the cloned DNA 
10 to be mutagenized. Clones containing the mutagenized DNA are recovered and the activities of the polypeptides they 

encode are assessed. 

Another method for generating variants is assembly PCR. Assembly PCR involves the assembly of a PCR 
product from a mixture of small DNA fragments. A large number of different PCR reactions occur in parallel in the 
same vial, with the products of one reaction priming the products of another reaction. Assembly PCR is described in 
1 5 U.S. Patent Application Serial No. 08/677,1 12, filed July 9, 1 997 and U.S. Patent Application Serial No. 08/942,504, 

filed October 31, 1997. 

Still another method of genrating variants is sexual PCR mutagenesis. In sexual PCR mutagenesis, forced 
homologous recombination occurs between DNA molecules of different but highly related DNA sequence in vitro, as a 
result of random fragmentation of the DNA molecule based on sequence homology, followed by fixation of the 

20 crossover by primer extension in a PCR reaction. Sexual PCR mutagenesis is described in Stemmer, W.P., PNAS, USA, 

91:10747-10751 (1994). Briefly, in such procedures a plurality of nucleic acids to be recombined are digested with 
DNAse to generate fragments having an average size of 50*200 nucleotides. Fragments of the desired average size 
are purifed and resuspended in a PCR mixture. PCR is conducted under conditions which facilitate recombination 
between the nucleic acid fragments. For example, PCR may be performed by resuspending the purified fragments at a 

25 concentration of 10-30ng(^l in a solution of 0.2mM of each dNTP, 2.2mM MgCI2, 50mM KCU 10mM Tris HCI, pH 

9.0, and 0.1% Triton X-100. 2.5 units of Taq polymerase per 100^1 of reaction mixture is added and PCR is 
performed using the following regime: 94° C for 60 seconds, 94° C for 30 seconds, 50-55° C for 30 seconds, 72° C 
for 30 seconds (3045 times) and 72° C for 5 minutes. However, it will be appreciated that these parameters may be 
varied as appropriate. In some embodiments, oligonucleotides may be included in the PCR reactions. In other 

30 embodiments, the Klenow fragment of DNA polymerase I may be used in a first set of PCR reactions and Taq 

polymerase may be used in a subsequent set of PCR reactions. Recombinant sequences are isolated and the activities 

of the polypeptides they encode are assessed. 

Variants may also be created by in vivo mutagenesis. In some embodiments, random mutations in a 

sequence of interest are generated by propagating the sequence of interest in a bacterial strain, such as an E. coli 

35 strain, which carries mutations in one or more of the DNA repair pathways. Such "mutator" strains have a higher 
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random mutation rate than that of a wild-type parent. Propagating the DNA in one of these strains will eventually 
generate random mutations within the DNA. Mutator strains suitable for use for in vivo mutagenesis are described in 
PCT Published Application WO 91/16427. 

Variants may also be generated using cassette mutagenesis. In cassette mutagenesis a small region of a 
5 double stranded DNA molecule is replaced with a synthetic oligonucleotide "cassette* that differs from the native 

sequence. The oligonucleotide often contains completely and/or partially randomized native sequence. 

Recursive ensemble mutagenesis may also be used to generate variants. Recursive ensemble mutagenesis is 
an algorithm for protein engineering (protein mutagenesis) developed to produce diverse populations of phenotypically 
related mutants whose members differ in amino acid sequence. This method uses a feedback mechanism to control 
1 0 successive rounds of combinatorial cassette mutagenesis. Recursive ensemble mutagenesis is described in Arkin, A.P. 

and Youvan, D.C., PNAS, USA, 89:7811-7815 (19921. 

In some embodiments, variants are created using exponential ensemble mutagenesis. Exponential ensemble 
mutagenesis is a process for generating combinatorial libraries with a high percentage of unique and functional 
mutants, wherein small groups of residues are randomized in parallel to identify, at each altered position, amino acids 
15 which lead to functional proteins. Exponential ensemble mutagenesis is described in Delegrave, S. and Youvan, D.C., 

Biotechnology Research, 11:1548-1552 (1993). Random and site-directed mutagenesis are described in Arnold, F.H., 
Current Opinion in Biotechnology, 4:450-455 (1993). 

In some embodiments, the variants are created using shuffling procedures wherein portions of a plurality of 
nucleic acids which encode distinct polypeptides are fused together to create chimeric nucleic acid sequences which 
20 encode chimeric polypeptides. Shuffling procedures are described in U.S. Patent Application Serial No. 08/677,112, 

filed July 9, 1996, U.S. Patent Application Serial No. 08/942,504, filed October 31, 1997, U.S. Patent No. 
5,939,250, issued August 17, 1999, and U.S. Patent Application Serial No. 09/375,605, filed August 17, 1999. 

The variants of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 
36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 may be (i) variants in 
25 which one or more of the amino acid residues of the polypeptides of SEO ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 

24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 
are substituted with a conserved or non-conserved amino acid residue (preferably a conserved amino acid residue) and 
such substituted amino acid residue may or may not be one encoded by the genetic code. 

Conservative substitutions are those that substitute a given amino acid in a polypeptide by another amino 
30 acid of like characteristics. Typically seen as conservative substitutions are the following replacements: replacements 

of an aliphatic amino acid such as Ala, Val, Leu and lie with another aliphatic amino acid; replacement of a Ser with a 
Thr or vice versa; replacement of an acidic residue such as Asp and Glu with another acidic residue; replacement of a 
residue bearing an amide group, such as Asn and Gin, with another residue bearing an amide group; exchange of a 
basic residue such as Lys and Arg with another basic residue; and replacement of an aromatic residue such as Phe, 
35 Tyr with another aromatic residue. 
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Other variants are those in which one or more of the amino acid residues of the polypeptides of SEQ ID NOs: 
4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 
66, 68, 70, 72, 74 76, 78, and 80 includes a substitute group. 

Still other variants are those in which the polypeptide is associated with another compound, such as a 
5 compound to increase the half-life of the polypeptide (for example, polyethylene glycol). 

Additional variants are those in which additional amino acids are fused to the polypeptide, such as a leader 
sequence, a secretory sequence, a proprotein sequence or a sequence which facilitates purification, enrichment, or 
stabilization of the polypeptide. 

In some embodiments, the fragments, derivatives and analogs retain the same biological function or activity 
10 as the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 

48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80. In other embodiments, the fragment, 
derivative, or analog includes a proprotein, such that the fragment, derivative, or analog can be activated by cleavage 
of the proprotein portion to produce an active polypeptide. 

Another aspect of the present invention are polypeptides or fragments thereof which have at least 70%, at 
1 5 least 80%, at least 85%, at least 90%, at least 95%, or more than 95% homology to one of the polypeptides of SEQ 

ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 
62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or a fragment comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 
100, or 150 consecutive amino acids thereof. Homology may be determined using a program, such as FASTA version 
3.0t78 with the default parameters, which aligns the polypeptides or fragments being compared and determines the 
20 extent of amino acid identity or similarity between them. It will be appreciated that amino acid "homology" includes 

conservative amino acid substitutions such as those described above. 

The polypeptides or fragments having homology to one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 
14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 
74 76, 78, and 80 or a fragment comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive 
25 amino acids thereof may be obtained by isolating the nucleic acids encoding them using the techniques described 

above. 

Alternatively, the homologous polypeptides or fragments may be obtained through biochemical enrichment or 
purification procedures. The sequence of potentially homologous polypeptides or fragments may be determined by 
proteolytic digestion, gel electrophoresis and/or microsequencing. The sequence of the prospective homologous 
30 polypeptide or fragment can be compared to one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 

22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 
80 or a fragment comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids 
thereof using a program such as FASTA version 3.0t78 with the default parameters. 

The polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 
35 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or fragments comprising at least 5, 
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10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof invention may be used in a variety of 
applications. For example, the polypeptides or fragments thereof may be used to catalyze biochemical reactions. In 
particular, the polypeptides of SEQ ID NOs: 14 and 46, which have homology to glutamate semialdehyde amino 
transferase, or fragments thereof, may be used to catalyze the synthesis of 5-aminolevulinate from S4-am*mo-5* 
5 oxopentanoate. The polypeptides of SEQ ID NOs: 26 and 58, which have homology to triose phosphate isomerase, or 

fragments thereof, may be used to catalyze the synthesis of glycerone phosphate from Dglyceraldehyde 3-phosphate. 
The polypeptides of SEQ ID NOs: 32 and 64, which have homology to dCMP deaminase, or fragments thereof, may be 
used to catalyze the reaction of deoxyctidine and water to produce deoxyuridine and ammonia. The polypeptides of 
SEQ ID NOs: 38 and 72, which have homology to the MenA protein, or fragments thereof, may be used to catalyze the 

10 synthesis of menaquinone. The polypeptide of SEQ ID NO: 80, which has homology to glucose- 1 -dehydrogenase, may 

be used to catalyze the synthesis of D-glucono-1,5-lacctone from D-glucose. 

The polypeptide of SEQ ID NO: 10, which has homology to lysyl tRNA synthetase, or fragments thereof, may 
be used to identify compounds capable of specifically inhibiting the growth of Cenarchaeum symbiosis, since tRNA 
synthetases are attractive targets for agents which inhibit growth. 

1 5 Agents which specifically inhibit the activity of the lysyl tRNA synthetase from Cenarchaeum symbiosum 

may be identified using a variety of methods known to those skilled in the art. For example, a plurality of agents may 
be generated using combinatorial chemistry or recombinant DNA libraries encoding a large number of short peptides. 
The lysyl tRNA synthetases from Cenarchaeum symbiosum and control organisms are contacted with the agents and 
those agents which bind to the lysyl tRNA synthetase from Cenarchaeum symbiosum but not to the enzyme from the 

20 control organisms are identified. Cenarchaeum symbiosum is then contacted with the identified agents to determine 

which agents inhibit the organism's growth. 

The polypeptides of SEQ ID NOs: 28 and 60, which have homology to the TATA box binding protein, or 
fragments thereof, may be used to identify promoters in nucleic acids from Cenarchaeum symbiosis. In such 
procedures, the polypeptide or fragment thereof is allowed to contact the nucleic acid and binding of the polypeptide 

25 or fragment thereof to the nucleic acid is detected. Binding may be detected by performing a gel shift analysis, a 

nuclease protection analysis, or by detecting the retention of the nucleic acid on a column matrix having the TATA box 
binding protein, or a fragment thereof, affixed thereto. 

Compounds which specifically inhibit the binding of the TATA box binding protein of Cenarchaeum symbiosis 
to promoters may also be used to inhibit growth of the organism. Such compounds may be identified as described 

30 above. 

Similarly, agents which specifically inhibit the activity of the polypeptides of SEQ ID NOs: 34 and 66, which 
have homology to RNA helicase, may be used to inhibit the growth of Cenarchaeum symbiosis. Such agents may be 
identified as described above. 
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The polypeptides of SEQ ID NDs: 30 and 62, which have homology to DNA polymerase I, or fragments 
thereof, may be used to insert a detectable label into a nucleic acid or to generate blunt ends on nucleic acids which 
have been digested with a restriction endonuclease. 

The polypeptides of SEQ ID NOs: 42 and 76, which have homology to site specific DNA 
5 methyltranseferases, or fragments thereof, may be used in procedures in which it is desirable to protect nucleic acid 

sequences from digestion with restriction endonucleases. For example, a nucleic acid sequence having one or more 
restriction sites therein may be treated with the polypeptides of SEQ ID NOs: 42 or 76 prior to the addition of linkers 
to the nucleic acid. Thereafter, the linkers may be digested with the restriction enzyme, while the sites in the 
remainder of the nucleic acid are protected from digestion. 

10 The polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 

42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80, or fragments comprising at least 5, 
10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof, may also be used to generate 
antibodies which bind specifically to the polypeptides or fragments. The resulting antibodies may be used to 
determine whether a biological sample contains Cenarchaeum symbiosum. In such procedures, a biological sample is 

15 contacted with an antibody capable of specifically binding to one of the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 

14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 
74 76, 78, and 80 or fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive 
amino acids thereof. The ability of the biological sample to bind to the antibody is then determined. For example, 
binding may be determined by labeling the antibody with a detectable label such as a fluorescent agent, an enzymatic 

20 label, or a radioisotope. Alternatively, binding of the antibody to the sample may be detected using a secondary 

antibody having such a detectable label thereon. A variety of assay protocols which may be used to detect the 
presence of Cenarchaeum symbiosum in a sample are familiar to those skilled in the art. Particular assays include 
ELISA assays, sandwich assays, radioimmunoassays, and Western Blots. 

Polyclonal antibodies generated against the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 

25 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or 

fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof can 
be obtained by direct injection of the polypeptides into an animal or by administering the polypeptides to an animal, 
preferably a nonhuman. The antibody so obtained will then bind the polypeptide itself. In this manner, even a 
sequence encoding only a fragment of the polypeptide can be used to generate antibodies which may bind to the 

30 whole native polypeptide. Such antibodies can then be used to isolate the polypeptide from cells expressing that 

polypeptide. 

For preparation of monoclonal antibodies, any technique which provides antibodies produced by continuous 
cell line cultures can be used. Examples include the hybridoma technique (Kohler and Milstein, 1975, Nature, 
256:495497), the trioma technique, the human B cell hybridoma technique (Kozbor et aL, 1983, Immunology Today 
35 4:72), and the EBV-hybridoma technique (Cole, et aL, 1985, in Monoclonal Antibodies and Cancer Therapy, Alan R. 
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Uss, Inc., pp. 77-96). 

Techniques described for the production of single chain antibodies (U.S. Patent No. 4,946,778) can be 
adapted to produce single chain antibodies to the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 
26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or 
5 fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof. 

Alternatively, transgenic mice may be used to express humanized antibodies to these polypeptides or fragments 
thereof. 

Antibodies generated against the polypeptides of SEQ ID NOs: 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 
28, 30, 32, 34, 36, 38, 40, 42, 44, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74 76, 78, and 80 or 

10 fragments comprising at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids thereof may 

be used in screening for similar polypeptides from other organisms and samples. In such techniques, polypeptides 
from the organism are contacted with the antibody and those polypeptides which specifically bind the antibody are 
detected. Any of the procedures described above may be used to detect antibody binding. One such screening assay 
is described in "Methods for Measuring Cellulase Activities", Methods in Enzymology, Vol 160, pp. 87*1 16. 

15 As used herein the term "nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 

57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77" 
encompasses the nucleotide sequences of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 
65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77, fragments of SEQ ID 
NOs. 1, 2, 5, 9, 13, 25. 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 

20 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77, nucleotide sequences homologous to SEQ ID NOs. 1, 2, 5, 9, 13, 25, 

27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 
53, 55, 69, 73 and 77 or homologous to fragments of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 
59, 61, 63, 65, 67, 71, 75. 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49. 51, 53, 55, 69, 73 and 77, and 
sequences complementary to all of the preceding sequences. The fragments include portions of SEQ ID NOs. 1, 2, 5, 9, 

25 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 

49, 51, 53, 55, 69, 73 and 77 comprising at least 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, 150, 200, 300, 400, or 500 
consecutive nucleotides of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 
79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77. Preferably, the fragments are novel 
fragments. Homologous sequences and fragments of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 

30 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 refer to a 

sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 85%, 80%, 75% or 70% homology to these sequences. 
Homology may be determined using any of the computer programs and parameters described herein, including BLASTN 
version 2.0 with the default parameters. Homologous sequences also include RNA sequences in which uridines replace the 
thymines in the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 

35 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77. The homologous 
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sequences may be obtained using any of the procedures described herein or may result from the correction of a sequencing 
error. It will be appreciated that the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 
57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21 r 23 ( 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 can 
be represented in the traditional single character format (See the inside back cover of Stryer, Lubert. Biochemistry, 7* 
5 edition. W. H Freeman & Co., New York.) or in any other format which records the identity of the nucleotides in a sequence. 

As used herein the term "polypeptide codes of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 
60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78" 
encompasses the polypeptide sequence of SEQ 10 NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 
68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 which are encoded by the 

1 0 extended cDNAs of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 

7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77, polypeptide sequences homologous to the 
polypeptides of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 
16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78, or fragments of any of the preceding sequences. 
Homologous polypeptide sequences refer to a polypeptide sequence having at least 99%, 98%, 97%, 96%, 95%, 90%, 

15 85%, 80%, 75% or 70% homology to one of the polypeptide sequences of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 

38, 42, 48, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, 
and 78. Homology may be determined using any of the computer programs and parameters described herein, including 
FASTA version 3.0t78 with the default parameters or with any modified parameters. The homologous sequences may be 
obtained using any of the procedures described herein or may result from the correction of a sequencing error. The 

20 polypeptide fragments comprise at least 5, 10, 15, 20, 25, 30, 35, 40, 50, 75, 100, or 150 consecutive amino acids of the 

polypeptides of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 
16, 18, 20, 22, 24, 36, 40, 44, 4B, 50, 52, 54, 56, 70, 74, and 78. Preferably, the fragments are novel fragments. It 
will be appreciated that the polypeptide codes of the SEQ ID NDS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 
62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74. and 78 can be 

25 represented in the traditional single character format or three letter format (See the inside back cover of Starrier, Lubert. 

Biochemistry, 3" edition. W. H Freeman & Co., New York.) or in any other format which relates the identity of the 
polypeptides in a sequence. 

It will be appreciated by those skilled in the art that the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 
27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 
30 53, 55, 69, 73 and 77 and polypeptide codes of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 

64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 can be stored, 
recorded, and manipulated on any medium which can be read and accessed by a computer. As used herein, the words 
"recorded 9 and "stored" refer to a process for storing information on a computer medium. A skilled artisan can readily 
adopt any of the presently known methods for recording information on a computer readable medium to generate 
35 manufactures comprising one or more of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 
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41, 45, 57, 59, 81. 63, 65, 67, 71. 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 
77, one or more of the polypeptide codes of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 
68, 72, 76, 80, 4, 8. 12, 16. 18, 20, 22. 24, 36, 40, 44, 48, 50, 52, 54. 56, 70, 74, and 78. Another aspect of the 
present invention is a computer readable medium having recorded thereon at least 2, 5, 10, 15, or 20 nucleic acid codes of 
SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 
21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77. 

Another aspect of the invention is a computer readable medium having recorded thereon one or more of the 
nucleic arid codes of SEQ ID NOs. 1, 2, 5. 9. 13. 25, 27. 29. 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, and 

79. Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, or 
15 of SEQ ID NOs. 1, 2, 5, 9, 1 3, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, and 79. 

Another aspect of the present invention is a computer readable medium having recorded thereon one or more 
of the nucleic acid codes of SEQ ID NOs. 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77. 
Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, or 15 of 
SEO ID NOs. 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77. 

Another aspect of the present invention is a computer readable medium having recorded thereon one or more 
of the polypeptide codes of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76. 

80. 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. Another aspect of the present 
invention is a computer readable medium having recorded thereon one or more of the the polypeptide codes of SEQ ID 
NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80. Another aspect of the 
present invention is a computer readable medium having recorded thereon one or more of the the polypeptide codes of 
SEQ ID NOS. 4, 8, 12, 16, 18, 20, 22, 24, 36, 40. 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

Another aspect of the present invention is a computer readable medium having recorded thereon at least 2, 5, 10, 
15, or 20 polypeptide codes of SEQ ID NOS. 6, 10, 14, 26, 28, 30, 32, 34. 38, 42, 46. 58, 60, 62, 64, 66, 68, 72, 76, 
80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. Another aspect of the present 
invention is a computer readable medium having recorded thereon at least 2, 5, 10, or 15 polypeptide codes of SEQ ID 
NOS. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80. Another aspect qf the 
present invention is a computer readable medium having recorded thereon at least 2, 5, 1 0, or 15 polypeptide codes of SEQ 
ID NOS. 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

Computer readable media include magnetically readable media, optically readable media, electronically readable 
media and magnetic/optical media. For example, the computer readable media may be a hard disk, a floppy disk, a 
magnetic tape, CD-ROM, Digital Versatile Disk (DVD), Random Access Memory (RAM), or Read Only Memory (ROM) as well 
as other types of other media known to those skilled in the art. 

Embodiments of the present invention include systems, particularly computer systems which store and 
manipulate the sequence information described herein. One example of a computer system 100 is illustrated in block 
diagram form in Figure 3. As used herein, "a computer system" refers to the hardware components, software components, 
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and data storage components used to analyze the nucleotide sequences of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 
9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 
47, 49, 51, 53, 55, 69, 73 and 77 or the sequences of the polypeptide codes of 6, 10, 14, 26, 28, 30, 32, 34, 38, 
42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 1 8, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 
5 78. The computer system 100 preferably includes a processor for processing, accessing and manipulating the sequence 

data. The processor 105 can be any weO-known type of central processing unit, such as the Pentium III from Intel 
Corporation, or similar processor from Sun, Motorola, Compaq or International Business Machines. 

Preferably, the computer system 100 is a general purpose system that comprises the processor 105 and one or 
more internal data storage components 1 10 for storing data, and one or more data retrieving devices for retrieving the data 

10 stored on the data storage components. A skilled artisan can readily appreciate that any one of the currently available 

computer systems are suitable. 

In one particular embodiment, the computer system 100 includes a processor 105 connected to a bus which is 
connected to a main memory 115 (preferably implemented as RAM) and one or more internal data storage devices 110, 
such as a hard drive and/or other computer readable media having data recorded thereon. In some embodiments, the 

15 computer system 100 further includes one or more data retrieving device 1 18 for reading the data stored on the internal 

data storage devices 1 1 0. 

The data retrieving device 1 18 may represent, for example, a floppy disk drive, a compact disk drive, a magnetic 
tape drive, etc. In some embodiments, the internal data storage device 1 1 0 is a removable computer readable medium such 
as a floppy disk, a compact disk, a magnetic tape, etc. containing control logic and/or data recorded thereon. The computer 

20 system 1 00 may advantageously include or be programmed by appropriate software for reading the control logic and/or the 

data from the data storage component once inserted in the data retrieving device. 

The computer system 100 includes a display 120 which is used to display output to a computer user. It should 
also be noted that the computer system 100 can be linked to other computer systems 125a c in a network or wide area 
network to provide centralized access to the computer system 1 00. 

25 Software for accessing and processing the nucleotide sequences of the nucleic acid codes of SEQ ID Nos.1, 2, 

5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 
43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEO ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 
46, 58. 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 
(such as search tools, compare tools, and modeling tools etc.) may reside in main memory 115 during execution. 

30 In some embodiments, the computer system 100 may further comprise a sequence comparer for comparing the 

above-described nucleic acid codes of SEQ ID Nos. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 
67, 71, 75, 79, 3, 7, 11,15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes 
of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 
22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 stored on a computer readable medium to reference nucleotide or 

35 polypeptide sequences stored on a computer readable medium. A "sequence comparer" refers to one or more programs 
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which are implemented on the computer system 100 to compare a nucleotide sequence with other nucleotide sequences 
and/or compounds stored within the date storage means. For example, the sequence comparer may compare the nucleotide 
sequences of the nucleic acid codes of SEQ ID Nos. 1,2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 
67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes 
of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 
22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 stored on a computer readable medium to reference sequences 
stored on a computer readable medium to identify homologies or structural motifs. Various sequence comparer programs 
identified elsewhere in this patent specification are particularly contemplated for use in this aspect of the invention. 
Protein and/or nucleic acid sequence homologies may be evaluated using any of the variety of sequence comparison 
algorithms and programs known in the art. Such algorithms and programs include, but are by no means limited to, 
TBLASTN, BLASTN, BLASTP, FASTA, TFASTA, and CLUSTALW (Pearson and Lipman, 1988, Proc. NatL Acad. Set. 
USA £#0:2444-2448; Altschul et al., 1990, J. Mol. Biol 2IS/3JM3AW; Thompson et al. t 1994, Nucleic Acids 
Res. 22(2J:4673-mQ; Higgins et al., 1996, Methods Enzymol. 266383402; Altschul et al, 1990, J. Mol. Biol. 
2/507:403-410; Altschul eta!., 1993, Nature Genetics 3:2^212). 

In one embodiment, protein and nucleic acid sequence homologies are evaluated using the Basic Local 
Alignment Search Tool ("BLAST") which is well known in the art (see, e.g., Karlin and Altschul, 1990, Proc. NatL 
Acad. ScL USA 57:2267-2268; Altschul et al. t 1990, J. Mol. Biol. 275:403-410; Altschul et al, 1993, Nature 
Genetics J:266-272; Altschul et al, 1997, Nuc. Acids Res. 253389*3402). In particular, five specific BLAST 
programs are used to perform the following task: 

(1) BLASTP and BLAST3 compare an amino acid query sequence against a protein 
sequence database; 

(2) BLASTN compares a nucleotide query sequence against a nucleotide sequence 
database; 

(3) BLASTX compares the six-frame conceptual translation products of a query 
nucleotide sequence (both strands) against a protein sequence database; 

(4) TBLASTN compares a query protein sequence against a nucleotide sequence 
database translated in all six reading frames (both strands); and 

(5) TBLASTX compares the six-frame translations of a nucleotide query sequence 
against the six-frame translations of a nucleotide sequence database. 

The BLAST programs identify homologous sequences by identifying similar segments, which are referred to 
herein as "high-scoring segment pairs/' between a query amino or nucleic acid sequence and a test sequence which is 
preferably obtained from a protein or nucleic acid sequence database. High-scoring segment pairs are preferably 
identified (i.e., aligned) by means of a scoring matrix, many of which are known in the art. Preferably, the scoring 
matrix used is the BL0SUM62 matrix (Gonnet etai, 1992, Science 255:1443-1445; Henikoff and Henikoff, 1993, 
Proteins / 7:49-61). Less preferably, the PAM or PAM250 matrices may also be used (see, e.g., Schwartz and 
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Dayhoff, eds., 1978, Matrices for Detecting Distance Relationships: Atlas of Protein Sequence and Structure, 
Washington: National Biomedical Research Foundation). BLAST programs are accessible through the U.S. National 
Library of Medicine, e.g., at www.ncbi.nlm.nih.gov. 

The BLAST programs evaluate the statistical significance of all high-scoring segment pairs identified, and 
5 preferably selects those segments which satisfy a user-specified threshold of significance, such as a user-specified 

percent homology. Preferably, the statistical significance of a high-scoring segment pair is evaluated using the 
statistical significance formula of Karlin {see, e.g., Karlin and Altschul, 1990, Proc. Natl. Acad. ScL USA 07:2267- 
2268). 

The parameters used with the above algorithms may be adapted depending on the sequence length and degree of 
10 homology studied. In some embodiments, the parameters may be the default parameters used by the algorithms in the 

absence of instructions from the user. 

Figure 4 is a flow diagram illustrating one embodiment of a process 200 for comparing a new nucleotide or 
protein sequence with a database of sequences in order to determine the homology levels between the new sequence and 
the sequences in the database. The database of sequences can be a private database stored within the computer system 
15 1 00, or a public database such as GENB ANK that is available through the Internet. 

The process 200 begins at a start state 201 and then moves to a state 202 wherein the new sequence to be 
compared is stored to a memory in a computer system 100. As discussed above, the memory could be any type of 
memory, including RAM or an internal storage device. 

The process 200 then moves to a state 204 wherein a database of sequences is opened for analysis and 
20 comparison. The process 200 then moves to a state 206 wherein the first sequence stored in the database is read into a 

memory on the computer. A comparison is then performed at a state 210 to determine if the first sequence is the same as 
the second sequence. It is important to note that this step is not limited to performing an exact comparison between the 
new sequence and the first sequence in the database. Well-known methods are known to those of skill in the art for 
comparing two nucleotide or protein sequences, even if they are not identical. For example, gaps can be introduced into 
25 one sequence in order to raise the homology leva! between the two tested sequences. The parameters that control 

whether gaps or other features are introduced into a sequence during comparison are normally entered by the user of the 
computer system. 

Once a comparison of the two sequences has teen performed at the state 210, a determination is made at a 
decision state 210 whether the two sequences are the same. Of course, the term "same" is not limited to sequences that 
30 are absolutely identical. Sequences that are within the homology parameters entered by the user will be marked as "same" 

in the process 200. 

If a determination is made that the two sequences are the same, the process 200 moves to a state 214 wherein 
the name of the sequence from the database is displayed to the user. This state notifies the user that the sequence with 
the displayed name fulfills the homology constraints that were entered. Once the name of the stored sequence is displayed 
35 to the user, the process 200 moves to a decision state 218 wherein a determination is made whether more sequences exist 
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in the database. If no more sequences exist in the database, then the process 200 terminates at an end state 220. 
However, if more sequences do exist in the database, then the process 200 moves to a state 224 wherein a pointer is 
moved to the next sequence in the database so that it can be compared to the new sequence. In this manner, the new 
sequence is aligned and compared with every sequence in the database. 

It should be noted that if a determination had been made at the decision state 212 that the sequences were not 
homologous, then the process 200 would move immediately to the decision state 218 in order to determine if any other 
sequences were available in the database for comparison. 

Accordingly, one aspect of the present invention is a computer system comprising a processor, a data 
storage device having stored thereon a nucleic acid code of SEQ ID Nos. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 
45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 
or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 
4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78, a data storage device having retrievably 
stored thereon reference nucleotide sequences or polypeptide sequences to be compared to the nucleic acid code of 
SEQ ID Nos.1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 
21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 
32, 34, 3B, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 
70, 74, and 78 and a sequence comparer for conducting the comparison. The sequence comparer may indicate a 
homology level between the sequences compared or identify structural motifs in the above described nucleic acid code 
of SEQ ID Nos. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 
19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 
30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4. 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 
56, 70, 74, and 78 or it may identify structural motifs in sequences which are compared to these nucleic acid codes 
and polypeptide codes. In some embodiments, the data storage device may have stored thereon the sequences of at 
least 2, 5, 10, 15, 20, 25, 30 or 40 or more of the nucleic acid codes of SEQ ID Nos. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 
37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11,15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 
and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 
76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 58, 70, 74, and 78. 

Another aspect of the present invention is a method for determining the level of homology between a nucleic acid 
code of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 
17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 
28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 
54, 56, 70, 74, and 78 and a reference nucleotide sequence or polypeptide sequence, comprising the steps of reading 
the nucleic acid code or the polypeptide code and the reference nucleotide or polypeptide sequence through the use of a 
computer program which determines homology levels ami determining homology between the nucleic acid code or 
polypeptide code and the reference nucleotide or polypeptide sequence with the computer program. The computer program 
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may be any of a number of computer programs for determining homology levels, including those specifically enumerated 
herein, including BLAST2N or BLASTN with the default parameters or with any modified parameters. The method may be 
implemented using the computer systems described above. The method may also be performed by reading at least 2, 5, 10, 
15, 20, 25, 30 or 40 or more of the above described nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 
5 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 

73 and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 
72, 76, 80, 4, 8, 12, 16, IB, 20. 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 through use of the computer 
program and determining homology between the nucleic acid codes or polypeptide codes and reference nucleotide 
sequences or polypeptide sequences. 

10 Figure 5 is a flow diagram illustrating one embodiment of a process 250 in a computer for determining 

whether two sequences are homologous. The process 250 begins at a start state 252 and then moves to a state 254 
wherein a first sequence to be compared is stored to a memory. The second sequence to be compared is then stored 
to a memory at a state 256. The process 250 then moves to a state 260 wherein the first character in the first 
sequence is read and then to a state 262 wherein the first character of the second sequence is read. It should be 

15 understood that if the sequence is a nucleotide sequence, then the character would normally be either A, T, C, G or U. 

If the sequence is a protein sequence, then it is preferably in the single letter amino acid code so that the first and 
sequence sequences can be easily compared. 

A determination is then made at a decision state 264 whether the two characters are the same. If they are 
the same, then the process 250 moves to a state 268 wherein the next characters in the first and second sequences 

20 are read. A determination is then made whether the next characters are the same. If they are, then the process 250 

continues this loop until two characters are not the same. If a determination is made that the next two characters are 
not the same, the process 250 moves to a decision state 274 to determine whether there are any more characters 
either sequence to read. 

If there aren't any more characters to read, then the process 250 moves to a state 276 wherein the level of 
25 homology between the first and second sequences is displayed to the user. The level of homology is determined by 
calculating the proportion of characters between the sequences that were the same out of the total number of 
sequences in the first sequence. Thus, if every character in a first 100 nucleotide sequence aligned with a every 
character in a second sequence, the homology level would be 100%. 

Alternatively, the computer program may be a computer program which compares the nucleotide sequences of 
30 the nucleic acid codes of the present invention, to reference nucleotide sequences in order to determine whether the nucleic 

acid code of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 
15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 differs from a reference nucleic acid sequence at one 
or more positions. OptionaOy such a program records the length and identity of inserted, deleted or substituted nucleotides 
with respect to the sequence of either the reference polynucleotide or the nucleic acid code of SEQ ID NOs. 1, 2, 5, 9, 1 3, 
35 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 
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51, 53, 55, 69 r 73 and 77. tn one embodiment, the computer program may be a program which determines whether the 
nucleotide sequences of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 
61. 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 contain a 
single nucleotide polymorphism (SNP) with respect to a reference nucleotide sequence. 
S Accordingly, another aspect of the present invention is a method for determining whether a nucleic acid code 

of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61. 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 
19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 differs at one or more nucleotides from a reference 
nucleotide sequence comprising the steps of reading the nucleic acid code and the reference nucleotide sequence 
through use of a computer program which identifies differences between nucleic acid sequences and identifying 

10 differences between the nucleic acid code and the reference nucleotide sequence with the computer program. In some 

embodiments, the computer program is a program which identifies single nucleotide polymorphisms. The method may 
be implemented by the computer systems described above and the method illustrated in Figure 6. The method may 
also be performed by reading at least 2, 5, 10, 15, 20, 25, 30, or 40 of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 
9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 

15 47, 49, 51, 53, 55, 69, 73 and 77 and the reference nucleotide sequences through the use of the computer program 

and identifying differences between the nucleic acid codes and the reference nucleotide sequences with the computer 
program. 

In other embodiments the computer based system may further comprise an identifier for identifying features 
within the nucleotide sequences of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 
20 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 

or the polypeptide codes of SEQ ID NOs. 6, 1 0, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, BZ 64. 66, 68, 72, 76, 80, 
4, 8, 12, 16, 1 8, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

An "identifier" refers to one or more programs which identifies certain features within the above-described 
nucleotide sequences of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 
25 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the 

polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 
12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. In one embodiment, the identifier may 
comprise a program which identifies an open reading frame in the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 1 3, 25, 
27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11,15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 
30 53, 55, 69, 73 and 77. 

Figure 7 is a flow diagram illustrating one embodiment of an identifier process 300 for detecting the 
presence of a feature in a sequence. The process 300 begins at a start state 302 and then moves to a state 304 
wherein a first sequence that is to be checked for features is stored to a memory 1 15 in the computer system 100. 
The process 300 then moves to a state 306 wherein a database of sequence features is opened. Such a database 
35 would include a list of each feature's attributes along with the name of the feature. For example, a feature name 
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could be "Initiation Codon" and the attribute would be "ATG". Another example would be the feature name TAATAA 
Box" and the feature attribute would be "TAATAA". An example of such a database is produced by the University of 
Wisconsin Genetics Computer Group (www.gcg.com). Alternatively, the features may be structural polypeptide motifs 
such as alpha helices, beta sheets, or functional polypeptide motifs such as enzymatic active sites, helix-turn-helix 
5 motifs or other motifs known to those skilled in the art. 

Once the database of features is opened at the state 306, the process 300 moves to a state 308 wherein 
the first feature is read from the database. A comparison of the attribute of the first feature with the first sequence 
is then made at a state 310. A determination is then made at a decision state 316 whether the attribute of the 
feature was found in the first sequence. If the attribute was found, then the process 300 moves to a state 318 

1 0 wherein the name of the found feature is displayed to the user. 

The process 300 then moves to a decision state 320 wherein a determination is made whether move 
features exist in the database. If no more features do exist, then the process 300 terminates at an end state 324. 
However, if more features do exist in the database, then the process 300 reads the next sequence feature at a state 
326 and loops back to the state 310 wherein the attribute of the next feature is compared against the first sequence. 

15 It should be noted, that if the feature attribute is not found in the first sequence at the decision state 316, 

the process 300 moves directly to the decision state 320 in order to determine if any more features exist in the 
database. 

Accordingly, another aspect of the present invention is a method of identifying a feature within the nucleic 
acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 

20 11,15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 

14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 
50, 52, 54, 56, 70, 74, and 78 comprising reading the nucleic acid code(s) or polypeptide code(s) through the use of a 
computer program which identifies features therein and identifying features within the nucleic acid code(s) with the 
computer program. In one embodiment, computer program comprises a computer program which identifies open 

25 reading frames. The method may be performed by reading a single sequence or at least 2, 5, 10, 15, 20, 25, 30, or 40 

of the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 
75, 79, 3, 7, 1 1, 15, 17. 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEQ ID 
NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 1 2, 16, 18, 20, 22, 24, 36, 
40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 through the use of the computer program and identifying features within 

30 the nucleic acid codes or polypeptide codes with the computer program. 

The nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 
67, 71, 75, 79, 3, 7, 1 1, 15, 1 7, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes 
of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 
22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 may be stored and manipulated in a variety of data processor 

35 programs in a variety of formats. For example, the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 
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37, 41. 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 
and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 
76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 may be stored as text in a word 
processing file, such as MicrosoftWORD or WORDPERFECT or as an ASCII file in a variety of database programs familiar to 
5 those of skill in the art, such as DB2, SYBASE, or ORACLE In addition, many computer programs and databases may be 

used as sequence comparers, identifiers, or sources of reference nucleotide sequences or polypeptide sequences to be 
compared to the nucleic acid codes of SEQ ID NOs. 1. 2, 5, 9. 13, 25, 27. 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 
67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 or the polypeptide codes 
of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 

10 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. The following list is intended not to limit the invention but to 

provide guidance to programs and databases which are useful with the nucleic acid codes of SEQ ID NOs. 1, 2, 5, 9, 13, 
25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43. 47, 49. 
51, 53, 55, 69, 73 and 77 or the polypeptide codes of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 
62, 64. 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

15 The programs and databases which may be used include, but are not Bmited to: MacPattem (EMBL), 

DiscoveryBase (Molecular Applications Group), GeneMine (Molecular Applications Group), Look (Molecular Applications 
Group), MacLook (Molecular Applications Group), BLAST and BLAST2 (NCBI), BLASTN and BLASTX lAltschul et at J. Mo/. 
Biol 215: 403 (1990)), FASTA (Pearson and Upman. Proc. Natl Acad. ScL USA, 85: 2444 (1988)), FASTDB (Brutiag et 
al. Comp. App. Biosci. 6:237-245, 1990), Catalyst (Molecular Simulations Inc.), Catalyst/SHAPE (Molecular Simulations 

20 Inc.), CeriuslDBAccess (Molecular Simulations Inc.), HypoGen (Molecular Simulations Inc.), Insight II, (Molecular 

Simulations Inc.), Discover (Molecular Simulations Inc.), CHARMm (Molecular Simulations Inc.), Felix (Molecular Simulations 
Inc.), DelPhi, (Molecular Simulations Inc.), QuanteMM, (Molecular Simulations Inc.), Homology (Molecular Simulations Inc.), 
Modeler (Molecular Simulations Inc.), ISIS (Molecular Simulations Inc.), Quanta/Protein Design (Molecular Simulations Inc.), 
WebLab (Molecular Simulations Inc.), WebLab Diversity Explorer (Molecular Simulations Inc.), Gene Explorer (Molecular 

25 Simulations Inc.), SeqFold (Molecular Simulations Inc.), the MDL Available Chemicals Directory database, the MDL Drug 

Data Report data base, the Comprehensive Medicinal Chemistry database, Derwents's World Drug Index database, the 
BioByteMasterFile database, the Genbank database, and the Genseqn database. Many other programs and data bases 
would be apparent to one of skid in the art given the present disclosure. 

Motifs which may be detected using the above programs include sequences encoding leucine zippers, helix- 

30 turn-helix motifs, glycosylation sites, ubiquitination sites, alpha helices, and beta sheets, signal sequences encoding 

signal peptides which direct the secretion of the encoded proteins, sequences implicated in transcription regulation 
such as homeoboxes, acidic stretches, enzymatic active sites, substrate binding sites, and enzymatic cleavage sites. 

The present invention will be further described with reference to the following examples; however, it is to be 
understood that the present invention is not limited to such examples. 
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In order to begin the physiological characterization of Cenarchaeum symbiosum, it was necessary to obtain 
enriched preparations of Cenarchaeum symbiosum for use in the construction of genomic DNA libraries in fosmid 
based vectors. Genomic DNA libraries were constructed from two enriched preparations using the methods described 
in Example 1 below. 

Example 1 

Enrichment of Cenarchaeum symbiosum Cells 
in Samples Obtained from Amelia Mexicana 
Enriched preparations of Cenarchaeum symbiosum for use in the preparation of the first fosmid genomic 
DNA library were obtained essentially as described in Preston, C. M. ei al. 1996. A psychrophilic crenarchaeon 
inhabits a marine sponge: Cenarchaeum symbiosum gen. nov., sp. nov. Proc. Mar/. Acad Sci. USA 93, 6241-6246. 
Briefly, a small individual of A. mexicana was incubated in calcium- and magnesium-free artificial seawater (ASW) 
containing 0.25 mg/ml Pronase. The tissue was then homogenized and enriched for archaeal cells by differential 
centrifugation. 

Enriched preparations of Cenarchaeum symbiosum for use in preparing the second fosmid genomic DNA 
library were obtained from a different sponge individual using the following improved enrichment procedure. A small 
individual of A. mexican8 was incubated in calcium- and magnesium-free artificial seawater (460mm NaCI, 1 1mM KCI, 
7mM Na 2 S0 4 , 2mM NaHC0 3 ) containing 0.25 mg/ml Pronase at room temperature for one hour. The sponge tissue 
was rinsed in artificial seawater and homogenized in a blender. Large particles and spicules were removed by low- 
speed centrifugation (4000 rpm, Sorvall GSA rotor at 4°C). The supernatant was next centrifuged at 5000 rpm for 5 
min. at 4°C to remove large sponge cells, and the resulting supernatant was centrifuged at 10,000 rpm in a GSA rotor 
at 4°C for 20 min. to collect the Cenarchaeum symbiosum cells. Following centrifugation, the recovered cell fraction 

containing Cenarchaeum symbiosum was further incubated for 1 hr at 4°C in 10 mM Tris/HCI pH B and 200 mM 
EOT A. The cells were then pelleted and subsequently purified on a 15 % Percoll (Sigma) cushion in artificial sea water 
centrifuged at 2500 rpm in a Beckman SS34 rotor. Archaeal cells banded in the light, upper fraction after 
centrifugation. This cell fraction was washed in ASW and resuspended in TE buffer (10 mM TrisHCI pH 8, 0.1 mM 
EDTA). The additional incubation step was found to increase the lysis of sponge cells, which resulted in an enhanced 
separation of archaeal and eukaryotic cells in the percoll gradient 

Quantitative hybridization experiments were performed as described in DeLong, E. F. 1992. Archaea in 
coastal marine environments. Proc. Natl. Acad. Sci. 89, 5685-5689 using an oligonucleotide specific for archaea 
having the sequence GTGCTCCCCCGCCAATTCCT (SEQ ID NO: 115). These hybridization experiments indicated that 
25% to 30% of the total rRNA from this fraction was derived from archaea. 

The enriched cell preparations were then utilized to construct fosmid libraries as described in Example 2 

below. 

Example 2 
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Construction of Fosmid Libraries 
DNA was extracted from the enriched preparations of Example 1 and inserted into fosmids as described in 
Preston, C. M.etal. 1996. A psychrophitic crenarchaeon inhabits a marine sponge: Cenarchaeum symbiosum gen. 
nov., sp. nov. Proc. Natl. Acad. Sci. USA 93, 6241-6246 and Stein, J.L etal 1996. Characterization of uncultivated 
5 prokaryotes: isolation and analysis of a 40-kilobase-pair genome fragment from a planktonic marine archaeon. J. 

Bacterid. 178, 591-599. A vertical cross section of sponge (0.5 g) was mechanically dissociated in 0.22^m filtered, 
autoclaved seawater using a tissue homogenizer. Cell lysis was accomplished by incubating the dissociated cells in 1 
mg of lysozyme per ml for 30 min. at 37°C followed by an incubation for 30 min. at 55°C with 0.5mg of proteinase K 
per ml and 1% SDS. The tubes were finally placed in a boiling water bath for 60 sec to complete lysis. The protein 

10 fraction was removed with two extractions with phenol:chloroform:isoamyl alcohol (50:49:1), pH 8.0, followed by a 

chloroform: isoamyl alcohol (24:1) extraction. Nucleic acids were ethanol-precipitated and resuspended in TE buffer 
(1QmM Tris.HCI/ImM Na 2 -EDTA, pH 8.0). Approximately 5ng of DNA was purified by CsCI equilibrium density 
gradiant ultracentriguation on a Beckman Optima tabletop ultracentrifuge using a TLA 100 rotor. The 
genomic ONA obtained above was inserted into fosmids as follows. The genomic DNA was partially digested with 

IS Sau3AI (Promega) and treated with heat-labile phosphatase (HK phosphatase; Epicentre). The partially digested 

genomic DNA was ligated with pFOS [See U.J. Kim et al. r Nucleic Acids Res. 20:1083 1085 (1992)) which had 
previously been digested with A at 1 1, phosphatase treated (HK phosphatase), and subsequently digested with BamHI. 
The ligation mixture was used for in vitro packaging with the Gigapack XL packaging system (Stratagene) selecting 
for DNA inserts of 35 to 45kb. The phage particles were transfected into £ cod DH10B (Bethesda Research 

20 LaboratoriesP and the cells were spread onto LB plates supplemented with 1 2.5|ag/m1 chloramphenicol. 

Example 3 

Identification of Fosmids Containing the Cenarchaeum svmbiosum rRNA Qperon 
The fosmid libraries constructed above were screened to identify clones containing the rRNA operon. PCR 
reactions were conducted on the library using primers known to amplify the rRNA operon. 
25 The first fosmid library yielded seven unique clones, out of a total of 10,236 recombinant fosmids, which 

contained the Cenarchaeum symbiosum rRNA operon. The second fosmid library yielded eight unique clones, out of a 
total of 2100 recombinant fosmids, which contained the Cenarchaeum symbiosum rRNA operon. 

The sequences of the 16S rRNA genes in each of the 15 fosmids containing the Cenarchaeum symbiosum 
rRNA operon were determined. The sequences of the small subunit rRNA genes of these 15 fosmids exhibited 
30 variations with respect to one another. Ten of the fosmids contained a small subunit rRNA gene having the sequence 

of the 16S rRNA gene in the insert of SEQ ID NO: 1, while the remaining fosmids contained a small subunit rRNA gene 
having the sequence of the 16S rRNA gene in the insert of SEQ ID NO: 2. As discussed in more detail below, the 
differences in the sequences of the rRNA genes may be used to determine whether a sample contains Cenarchaeum 
symbiosum variant A or Cenarchaeum symbiosum variant B. 
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In addition to determining the sequences of the rRNA genes, the sequences adjacent to the rRNA genes were 
also determined. 

Example 4 
Fosmid Sequencing 

Partial restriction enzyme digests were conducted on two purified fosmids, fosmid 101G10 (which contains 
the variant A sequence) and fosmid 60A5 (which contains the variant B sequence). The partially digested DNA was 
used to construct plasmid libraries containing inserts of 1-2 kb. The resulting plasmids were sequenced using Applied 
Biosystems (ABI, Foster City, CA) Prism Dye-terminator FS reaction mix. Direct sequencing from fosmids was used 
for gap filling and resequencing to ensure accuracy. Fosmid sequencing was performed by using DNA from a single 3 
ml overnight culture purified on an Autogen 740 automated plasmid isolation system. Each reaction consisted of one 
preparation of DNA directly resuspended by the addition of 16//1 H2O, 8//I oligonucleotide primer (1.4 pmol//sl) and 16 
jj\ ABI Prism Dye-terminator FS reaction mix. Cycle sequencing was performed with a 96° C 3 min. preincubation 
followed by 25 cycles of the sequence 96° C 20 sec. I 50° C 20 sec. / 60° C 4 min. and a 5 min. post-cycling 
incubation at 60° C. Sequencing reaction products were analyzed on ABI 377 Prism Sequencers. 

The complete sequences of the Cenarchaeum symbiosum derived inserts in the two fosmids are provided in 
the accompanying sequence listing as SEQ ID NO: 1 (fosmid 101610) and SEO ID NO: 2 (fosmid 60A5). The insert of 
fosmid 101 G10 (SEO ID NO: 1, designated variant A) was 32,998 bp and was syntenic over ca. 28 kbp with the 
42,432 bp insert of fosmid 60A5 (SEQ ID N0:2, designated variant B). Analysis of the common 28 kbp region is 
shown in Fig. 1. 

Although the sequences of both fosmids could be aligned unambiguously over most of the overlapping region, 
four large insertion/deletions ranging in size from 142 bp to 1994 bp were identified between positions 20,500 and 
25,800. The longest insertion contained a repetitive element of 1784 bp, that was found in the sequence of SEQ ID 
NO: 1 between menk and ORF05. It was composed of a 3-fold direct repeat of 575 bp (repl through 3 in Fig. 1), with 
repeats exhibiting only minor sequence variation (95.8% to 98.7% identity). 

A segment of 56 bp at the start of this repeat was also found adjacent to the 3* terminus of the third direct 
repeat. No obvious structural or sequence similarities to known repeats or mobile genetic elements from other 
organisms were identified within the repeat sequence. Its occurrence in only one variant and its relatively low G+C 
content relative to the rest of the fragment suggest that it may have been acquired by horizontal transfer from a 
different genetic context. 

The sequenced regions contained several open reading frames or RNA encoding sequences. Some of the 
identified open reading frames encode proteins having homology to previously identified proteins. In particular, some 
of the open reading frames encode proteins involved in several metabolic pathways, providing insight into the 
physiology of Cenarchaeum symbiosum. 

An open reading frame which encodes a protein having homology to glutamate semialdehyde 

aminotransferase (a protein involved in heme biosynthesis) was identified between nucleotides 7604-8908 of the 
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insert from fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 23558*24682 of the insert from fosmid 60A5 
(SEQ ID NO: 2) . These open reading frames have been assigned SEQ ID NOs: 45 and 13 respectively in the 
accompanying sequence listing, while the polypeptides they encode have been assigned SEQ ID NOs: 46 and 14 
respectively in the accompanying sequence listing. A gene encoding glutamate semialdehyde aminotransferase has 
5 also been detected in a rRNA operon containing genomic fragment of a planktonic marine crenarchaeote. (Stein, J.L et 

al 1996. Characterization of uncultivated prokaryotes: isolation and analysis of a 40-ktlobase-pair genome fragment 
from a planktonic marine archaeon. J. Bacterial. 178, 591-599) 

An open reading frame encoding a protein having homology to triose-phosphate isomerase was identified 
between 13944-14612 of the insert from fosmid 101610 (SEQ ID NO: 1) and between nucleotides 29655-30491 of 

10 the insert from fosmid 60A5 (SEQ ID NO: 2) . These open reading frames have been assigned SEQ ID NOs: 57 and 25 

respectively in the accompanying sequence listing, while the polypeptides they encode have been assigned SEQ ID 
NOs: 58 and 26 respectively in the accompanying sequence listing. This triosephosphate isomerase represents the 
first such protein sequence reported in a crenarchaeote, and shares known archaeal signature sequences and deletions 
which distinguish archaeal triosephosphate isomerase genes from their eucaryal and eubacterial homologues. 

IS An open reading frame encoding a protein having homology to the TATA binding protein was identified 

between 14616-15164 of the insert from fosmid 101610 (SEQ ID NO: 1) and between nucleotides 30501-31049 of 
the insert from fosmid 60A5 (SEQ ID NO: 2) on the strands complementary to the insert strands provided in SEQ ID 
NOs: 1 and 2. These open reading frames have been assigned SEQ ID NOs: 59 and 27 respectively in the 
accompanying sequence listing, while the polypeptides they encode have been assigned SEQ ID NOs: 60 and 28 

20 respectively in the companying sequence listing. This TATA box-binding protein (TBP) is similar to other known 

archaeal TBP's and is N-terminally truncated with respect to the eukaryal homologs. It shares 49% amino acid 
similarity with TBP from Pyrococcus woesiL 

An open reading frame encoding a protein having homology to DNA polymerase (a protein involved in DNA 
replication and repair) was identified between nucleotides 15488-18025 of the insert from fosmid 101610 (SEQ ID 

25 NO: 1) and between nucleotides 31371-33905 of the insert from fosmid 60 A5 (SEQ ID NO: 2) on the strands 

complementary to the insert strands provided in SEQ ID NOs: 1 and 2. These open reading frames have been assigned 
SEQ ID NOs: 61 and 29 respectively in the accompanying sequence listing, while the polypeptides they encode have 
been assigned SEQ ID NOs: 62 and 30 respectively in the accompanying sequence listing. 

The DNA polymerase of Cenarchaeum symbiosum has a high degree of similarity to the crenarchaeal 

30 homologs from the extreme thermophiles Sulfolobus acidocaidarius and Ryrodictium occuitum (54% and 53% resp.) 

and exhibits all conserved motifs of B-(a-)type DNA polymerases and 3'*5'-exonuclease motifs, both indicative of 
archaeal polymerases. A more detailed phylogenetic analysis and biochemical characterization of the C. symbiosum 
polymerase has been published elsewhere. (Schleper, C, etai. 1997. Characterization of a DNA polymerase from the 
uncultivated psy chrophilic archaeon Cenarchaeum symbiosum. J. Bact. 1 79, 7803-78 1 1 ) 
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An open reading frame which encodes a protein having homology to dCMP deaminase (a protein involved in 
pyrimidine synthesis) was identified between nucleotides 18022-18663 of the insert from fosmid 101G10 (SEQ ID 
NO: 1) and between nucleotides 33902-34456 of the insert from fosmid 60A5 (SEQ ID NO: 2) on the strands 
complementary to the insert strands provided in SEQ ID NOs: 1 and 2. These open reading frames have bran assigned 
5 SEQ ID NOs: 63 and 31 respectively in the accompanying sequence listing, white the polypeptides they encode have 

been assigned SEQ ID NOs: 64 and 32 respectively in the accompanying sequence listing. 

An open reading frame encoding a protein having homology to the ATP dependent RNA helicase (a protein 
involved in translation) was identified between nucleotides 18638-20149 of the insert from fosmid 101G10 (SEQ ID 
NO: 1) and between nucleotides 34559-36067 of the insert from fosmid 60A5 (SEQ ID NO: 2) . These open reading 
10 frames have been assigned SEQ ID NOs: 65 and 33 respectively in the accompanying sequence listing, while the 

polypeptides they encode have been assigned SEQ ID NOs: 66 and 34 respectively in the accompanying sequence 
listing. The identified ATP RNA helicase is highly similar in sequence to homologues found in the genomic sequences 
of three euryarchaeota (Bult, C, et at. Complete genome sequence of the methanogenic archaeon, Methanococcus 
jannaschii. Science 273, 1058-1073; Klenk, H.P. et al 1997. The complete genome sequence of the 
15 hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fufgidus. Nature 390, 364-370; Smith, D. al 

1997. Complete genome sequence of Methanobacterium thermoautotrophicum delta H: functional analysts and 
comparative genomics../. BacterioL 179, 71357155). 

An open reading frame encoding a protein having homology to MenA (a protein involved in menaquinone 
biosynthesis) was identified between nucleotides 20956-21834 of the insert from fosmid 101G10 (SEQ ID NO: 1) 
20 and between nucleotides 37404-38282 of the insert from fosmid 60A5 (SEQ ID NO: 2). These open reading frames 

have been assigned SEQ ID NOs: 71 and 37 respectively in the accompanying sequence listing, while the polypeptides 
they encode have been assigned SEQ ID NOs: 72 and 38 respectively in the accompanying sequence listing. 

An open reading frame encoding a protein having homology to the site specific DNA methyltranseferase 
proteins involved in restriction/modification was identified between nucleotides 26378-27454 of the insert from 
25 fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 4056341669 of the insert from fosmid 60A5 (SEQ ID NO: 

2) on the strands complementary to the insert strands provided in SEQ ID NOs: 1 and 2. These open reading frames 
have been assigned SEQ ID NOs: 75 and 41 respectively in the accompanying sequence listing, while the polypeptides 
they encode have been assigned SEQ ID NOs: 76 and 42 respectively in the accompanying sequence listing. 

An open reading frame encoding a protein having homology to the histone H1 DNA binding protein was 
30 identified between nucleotides 10625-1 134 of the insert from fosmid 60A5 (SEQ ID NO: 2) . This open reading frame 

has been assigned SEQ ID No: 5 in the accompanying sequence listing, while the polypeptide it encodes has been 
assigned SEQ ID No: 6 in the accompanying sequence listing. 

An open reading frame encoding a protein having homology to lysyl tRNA synthetase was identified between 
nucleotides 13046-14620 of the insert from fosmid 60A5 (SEQ ID NO: 2). This open reading frame has been assigned 
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SEQ ID No: 9 in the accompanying sequence listing, while the polypeptide it encodes has been assigned SEQ ID No: 10 
in the accompanying sequence listing. 

A hypothetical open reading frame was identified between nucleotides 1 1478-13046 of the insert from 
fosmid 60A5 (SEQ ID NO: 2). This open reading frame has been assigned SEQ ID No: 7 in the accompanying sequence 
5 listing, while the polypeptide it encodes has been assigned SEQ ID No: B in the accompanying sequence listing. 

An open reading frame encoding a protein having homology to peptidylprolyl cis/trans isomerase (a 
chaperone) was identified between nucleotides 20156-20434 of the insert from fosmid 101G10 (SEQ ID NO: 1) on 
the strand complementary to that provided in the sequence listing. This open reading frame has been assigned SEQ ID 
No: 67 in the accompanying sequence listing, while the polypeptide it encodes has been assigned SEQ ID No: 68 in the 

1 0 accompanying sequence listing. 

An open reading frame encoding a protein having homology to glucose- 1 -dehydrogenase was identified 
between nucleotides 28065-29843 of the insert from fosmid 101610 (SEQ ID NO: 1) . This open reading frame has 
been assigned SEQ ID No: 79 in the accompanying sequence listing, while the polypeptide it encodes has been 
assigned SEQ ID No: 80 in the accompanying sequence listing. 

15 a hypothetical open reading frame designated Hypothetical 01 was identified between nucleotides 1358 

2290 of the insert from fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 17329-18213 of the insert from 
fosmid 60A5 (SEQ ID NO: 2) on the strands complementary to the insert strands provided in SEQ ID NOs: 1 and 2. 
These open reading frames have been assigned SEQ ID NOs: 43 and 1 1 respectively in the accompanying sequence 
listing, while the polypeptides they encode have been assigned SEQ ID NOs: 44 and 12 respectively in the 

20 accompanying sequence listing. A hypothetical open reading frame designated Hypothetical 02 was identified 

between nucleotides 8961-9767 of the insert from fosmid 101610 (SEQ ID NO: 1) between nucleotides 24913- 
25728 of the insert from fosmid 60A5 (SEQ ID NO: 2). These open reading frames have been assigned SEQ ID NOs: 
47 and 15 respectively in the accompanying sequence listing, while the polypeptides they encode have been assigned 
SEQ ID NOs: 48 and 16 respectively in the accompanying sequence listing. 

25 An open reading frame designated ORF 01 was identified between nucleotides 9772-10479 of the insert 

from fosmid 1Q1G10 (SEQ ID NO: 1) and between nucleotides 25732-26427 of the insert from fosmid 60A5 (SEQ ID 
NO: 2) on the strands complementary to the insert strands provided in SEQ ID NOs: 1 and 2. These open reading 
frames have been assigned SEQ ID NOs: 49 and 17 respectively in the accompanying sequence listing, while the 
polypeptides they encode have been assigned SEQ ID NOs: 50 and 18 respectively in the accompanying sequence 

30 listing. 

An open reading frame designated QRF 02 was identified between nucleotides 10545-10922 of the insert 
from fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 26504-26881 of the insert from fosmid 60A5 (SEQ ID 
NO: 2). These open reading frames have been assigned SEQ ID NOs: 51 and 19 respectively in the accompanying 
sequence listing, while the polypeptides they encode have been assigned SEQ ID NOs: 52 and 20 respectively in the 
35 accompanying sequence listing. 
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An open reading frame designated ORF 03 was identified between nucleotides 11382-11987 of the insert 
from fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 27337-27936 of the insert from fosmid 60A5 (SEQ ID 
NO: 2) on the strands complementary to the insert strands provided in SEQ ID NOs: 1 and 2. These open reading 
frames have been assigned SEQ ID NOs: 53 and 21 respectively in the accompanying sequence listing, while the 
S polypeptides they encode have been assigned SEQ ID NOs: 54 and 22 respectively in the accompanying sequence 

listing. 

An open reading frame designated ORF 04 was identified between nucleotides 12916-13737 of the insert 
from fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 28822-29631 of the insert from fosmid 60A5 (SEQ ID 
NO: 2) on the strands complementary to the insert strands provided in SEQ ID NOs: 1 and 2. These open reading 
10 frames have been assigned SEQ ID NOs: 55 and 23 respectively in the accompanying sequence listing, while the 

polypeptides they encode have been assigned SEQ ID NOs: 56 and 24 respectively in the accompanying sequence 
listing. 

An open reading frame designated Hypothetical 03 was identified between nucleotides 20554-20955 of the 
insert from fosmid 101G10 (SEQ ID NO: 1) and between nucleotides 37002-37403 of the insert from fosmid 60A5 
15 (SEQ ID NO: 2). These open reading frames have been assigned SEQ ID NOs: 69 and 35 respectively in the 

accompanying sequence listing, while the polypeptides they encode have been assigned SEQ ID NOs: 70 and 36 
respectively in the accompanying sequence listing. 

An open reading frame designated ORF 05 was identified between nucleotides 25151-26377 of the insert 
from fosmid 101610 (SEQ ID NO: 1) and between nucleotides 39454-40572 of the insert from fosmid 60A5 (SEQ ID 
20 NO: 2). These open reading frames have been assigned SEQ ID NOs: 73 and 39 respectively in the accompanying 

sequence listing, while the polypeptides they encode have been assigned SEQ ID NOs: 74 and 40 respectively in the 
accompanying sequence listing. 

An open reading frame encoding a protein with no homology to known proteins was identified between 
nucleotides 3-10421 of the insert from fosmid 60A5 (SEQ ID NO: 2). This open reading frame has been assigned SEQ 
25 ID No: 3 in the accompanying sequence listing, while the polypeptide it encodes has been assigned SEQ ID No: 4 in the 

accompanying sequence listing. 

An open reading frame designated ORF06 was identified between nucleotides 27535-28002 of the insert 
from fosmid 101G10 (SEQ ID NO: 1) . This open reading frame has been assigned SEQ ID No: 77 in the accompanying 
sequence listing, while the polypeptide it encodes has been assigned SEQ ID No: 78 in the accompanying sequence 
30 listing. 

A gene cod.no for tRNA T Y r was identified between nucleotides 12129-12251 of the insert from fosmid 
101G10 (SEQ ID NO: 1) and between nucleotides 28058-28180 of the insert from fosmid 60A5 (SEQ ID N0:2) . This 
tRNA contains a 45 bp intron in the vicinity of the anticodon loop. 

Table 1 shows the level of homology between the open reading frames in the inserts from fosmid 101 G10 

35 and fosmid 60 A5 at the nucleic acid level. Table 1 also shows the level of homology at the amino acid level between 
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the polypeptides encoded by the insert from fosmid 101G10 and fosmid 60A5. Nucleic acid homology was calculated 
using BLASTN with the default parameters. Amino acid homology was calculated using FASTA with the parameters. 
As shown in Table 1 and Fig. 1, the protein coding regions were highly similar in both nucleic acid and deduced amino 
acid sequences. 

5 Over the 28 kb common region in the 101G10 and 60A5 inserts, the inserts shared > 99.2% identity in 

their ribosomal RNA genes, approximately 87.8% overall DNA identity, an average of 91.6% similarity in ORF amino 
acid sequence, and complete colinearity of protein encoding regions. As shown in Table 1, in protein coding regions 
the DNA identity of the two contigs ranged from 80.9% (triose phosphate isomerase) to 91.5% (Hypothetical 03). 
Within intergenic regions the identity dropped to 70 - 86 %, and small insertions or deletions were found frequently. 

10 The high similarity in coding regions and upstream sequences aided in the identification of genes, start codons, and 

putative transcriptional promoter motifs (see below). Genes appear as densely packed in C. symbiasum as they are in 
other sequenced archaeal genomes (Bult, C, et ah 1996. Complete genome sequence of the methanogenic archaeon, 
Methanococcus fannaschii. Science 273, 1058-1073, Klenk, H.P. etaf. 1997. The complete genome sequence of the 
hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature 390, 364-370; Smith, 0. R., et al 

15 1997. Complete genome sequence of Methanobacterium thermoautotrophicum delta H: functional analysis and 

comparative genomics. J. Bacterid. 179, 7135-7155). 

The ribosomal RNA operon of Cenarchaeum symbiosum is composed of the genes for the 16S and 23S 
rRNAs separated by a spacer of 131 bp. This organization is typical of crenarchaeotes, and differs from rRNA 
operons of euryarchaeotes, which usually contain 5S RNA and tRNA genes. (Garrett, R. A. et al. 1991. Archaeal 

20 rRNA operons. TIBS 16, 22-26). The large subunit rRNA genes are located between nucleotides 2680-5674 of SEQ 

ID NO: 1 (fosmid 101G10) and between nucleotides 18645-21639 of SEQ ID NO: 2 (fosmid 60A5). The small subunit 
rRNA genes are located between nucleotides 5806-7278 of SEQ ID NO: 1 (on the opposite strand from that shown in 
the Sequence Listing, as indicated in Figure 1) and between nucleotides 21771-23243 of SEQ ID NO: 2. The large 
and small subunit rRNA genes in the two fosmids were 99.2% and 99.3% identical, respectively. 

25 As mentioned above, the sequences of the Cenarchaeum symbiosum derived inserts in fosmids 101G1O and 

60A5 had a high degree of homology but were not completely identical. The sequence of the insert in fosmid 101G10 
was designated variant A, while the sequence of the insert in fosmid 60A5 was designated variant B. Such sequence 
differences could arise if the fosmid inserts were derived from two closely related but distinct strains of Cenarchaeum 
symbiosum or, alternatively, the sequence differences could be due to cloning or sequencing artifacts. To confirm that 

30 the fosmid inserts were in fact derived from two closely related strains, portions of the inserts in a plurality of 

different fosmids were sequenced to determine whether they were identical to either of the inserts in fosmids 
101G10 and 60A5, as would be the case if there were in fact two closely related strains of Cenarchaeum symbiosum. 

In particular, the ribosomal RNA spacer regions of variant A and variant B contained 10 distinguishing 
signature nucleotides and the 16S rRNA genes of variant A and variant B contained two distinguishing nucleotides. 
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Example 5 provides the results of a PCR based analysis of the 16S rRNA gene and the 16S-23S spacer region in 13 
different fosmid inserts. 

Example 5 

5 PCR Based Analysis of Fosmid Inserts to Determine 

Whether thev Contain the Variant A or Variant B Sequences 
Primers 21F and 459R-LSU (CTTTCCCTCACGGTA, SEO ID NO: 116) were used to amplify the 16S-23S 
- spacer region from the fosmids. The amplification products were sequenced using primer SP23rev (CTA TTG CCG 
TCTTTA CACC, SEQ ID NO: 117). 
1 o PCR reactions with two archaea-specific IBS rDNA primers (21 F and 958R (DeLong, E. F. 1 992. Archaea in 

coastal marine environments. Proc. Natl. Acad. Sci. 89, 5685*5689), one of which was biotinylated, were used to 
amplify a 950 base pair (bp) fragment from the fosmids. The PCR products were purified and sequenced as described 
in Preston, C. M. et al. 1996. A psychrophilic crenarchaeon inhabits a marine sponge: Cenarchaeum symbiosum gen. 
nov., sp. nov. Proc. Natl. Acad. Sci. USA 93, 6241-6246 with primer 519R 16S rDNA 
1 5 The results of this analysis are shown in Table 2. As shown in Table 2, in samples obtained from several 

unique rRNA operon-containing fosmids, a sequence identical to either variant A (101610) or variant B (60A5) was 
present. 

The above methods may also be used to determine whether a biological sample contains variant A and/or 
variant B. In such procedures, nucleic acids are obtained from the biological sample, amplified using the above 
20 primers, and sequenced using the above oligonucleotide to determine whether the sample contains the variant A and/or 

the variant B sequence. 

Similarly, the amplification reaction may be conducted using any primers which generate amplification 
products having sequences which differ between variant A and variant B. The amplification products may then be 
sequenced to determine whether they have the sequence of variant A andfor variant B. In some embodiment the 
25 amplification reaction may be conducted under conditions in which the amplification primers specifically hybridize to 

one of the variants. 

RFLP analyses were also be used to assess whether the fosmids contained the sequence of variant A or 
variant B as described in Example 6 below. 

Example 6 

30 RFLP Based Analysis of Fosmids to Determine Whether 

Thev Contain the Variant A or Variant B Sequences 
Primer set 21 F (DeLong, E. F. 1992. Archaea in coastal marine environments. Proc. Natl Acad. Sci. 89, 
5685-5689) and 459R-LSU for the amplification of 2.2 kbp of the ribosomal operon, primer set GSAT810F 
(GAATCCGCC CCCGACTATCTT, SEQ ID NO: 118) and 16S37REV (CATGGCTTAGTATCAATC SEQ ID NO: 119) for 
35 the amplification of the 16S RNA-GSAT region (2.2 kbp) and primer set Cenpol357F (ACITACAACGGI GACGAYTTTGA 
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SEO ID NO: 120) and Cenpol735R (CACCCCGAARTAGTTYTTYTT SEQ ID NO: 121) for an internal DNA polymerase 
fragment (of 1134 bp) were used in PCR reactions with 5 ng of purified fosmids. The PCR products were cut with 
Taql and Hpall (16S-23S RNA), Haelll and Rsal (GSAT-16S RNA) or Haelll and Avail (polymerase) and analyzed on 2 
% agarose gels. 

The results are shown in Table 2. If the pattern did not eiactly match but closely resembled the RFLP of 
either type A or B, it was assigned as a lower case letter (a or b, Table 2), meaning that at least 3 out of 4 or 3 out of 
5 bands created by restriction digest appear identical in size to the ones from either type A or B. As shown in Table 2, 
RFLP patterns of the 1 150 bp fragment covering the 5'-end of the GSAT gene and 16S gene and the internal fragment 
of 1134 bp from the DNA polymerase gene revealed that all fosmids analyzed could again be assigned to either the A 
or B type, although slight variations were also detected (lower case letters in Table 2), suggesting that both variants 
exhibit further microheterogeneity which is detectable in protein coding and intergenic regions. 

The above methods may also be used to determine whether a biological sample contains variant A and/or 
variant B. In such procedures, nucleic acids are obtained from the biological sample, amplified using the above 
primers, and digested as described above to determine whether the sample contains the variant A and/or the variant B 
sequence. Similar analyses may also be performed using other portions of the sequences of SEQ ID NOs: 1 and 2 
which are different from one another. 

To further confirm the existence of two closely related strains of Cenarchaeum symbiosum, biological 
samples were obtained from several individual sponges and analyzed to determine whether the samples contained 
variant A and/or variant B. Example 7 below provides the results of a PCR analysis of the Cenarchaeum symbiosum 
IBS rRNA genes in samples obtained from several individual sponges in different locations and at different times. 

Example 7 

Analysis of Samples from Individual Sponges 
The 16S rRNA genes of variant A and variant B differ at positions 175 and 183.7 (£ ca// numbering). PCR 
reactions with two arehaea-specific 16S rDNA primers (21F and 958R (DeLong, E. F. 1992. Archaea in coastal marine 
environments. Proc. Natl. Acad. ScL 89, 5685-5689), one of which was biotinylated, were used to amplify a 950 
base pair (bp) fragment from total nucleic acids derived from several different sponge individuals. The PCR products 
were purified and sequenced as described in Preston, C. M. et al 1996. A psychrophilic crenarchaeon inhabits a 
marine sponge: Cenarchaeum symbiosum gen. nov., sp. nov. Proc. Natl. Acad. ScL USA 93, 6241-6246 with primer 
519R. 

The amplification products were sequenced to determine whether they corresponded to variant A and/or 
variant B. The results are shown in Table 3. As shown in Table 3, in 15 out of 16 cases U/C ambiguities were found 
at the signature positions, indicating the presence of both variants in samples obtained from a single sponge (Table 3). 
Only one sponge (S4) yielded an unambiguous sequence identical to variant A, but variant B was detected in this 
individual by another criterion (see below). 
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Hybridization analyses were also used to determine whether individual sponges harbored variant A and/or 
variant B. The results of these analyses are provided in Example 8 below. 

Example B 

5 Hybridization Based Analysis of Samples Obtained from Axinella Mexicans to Determine Whether the Samples Contain 

Variant A andfor Variant B 

Two oligonucleotides specific for each variant type were designed from the 23S rDNA gene sequences of 
fosmids 101G10 and B0A5. The probes differed in 3 positions and have the sequences ACACTTCAACTATTTCCTG 
(SEQ ID NO: 122 variant A) and ACACTTTGACTATTTCGTG (SEQ ID NO: 123, variant B). Nucleic acid samples from 

10 individual sponges (300 ng) and controls (fosmids 101G10 and 60A5, 50 ng each) were denatured, bound to nylon 

membranes (Hybond-N, Amersham), hybridized with the labeled probes (Massana, R. et at. 1997. Vertical distribution 
and phylogenetic characterization of marine planktonic Archaea in the Santa Barbara Channel. Appl. Env. Microb. 63, 
50*56) and washed at 41.5 °C. Hybridization was analyzed by autoradiography. 

The results are provided in Table 3. In the samples from the majority of host sponges examined, 

1 5 the presence of both 23S rRNA variants was observed, confirming that the specific association of C. symbiosum with 

its host typically involves the presence of both variants. 

The data provide strong evidence that these genomic clones are derived from two very closely related, but 
distinct strains, as opposed to representing two ribosomal RNA operon regions originating from the same organism. 
This conclusion is consistent with the observation that all crenarchaeota characterized to date contain only one 

20 ribosomal RNA operon (Garrett, R. A. etal. 1991. Archaeal rRNA operons. T/BS 16, 22-26). 

The high conservation between the inserts in fosmid 101G10 and fosmid 60A5 was not entirely confined to 
coding regions but also extended into adjacent upstream sequences. Due to this upstream similarity, and also because 
the average G+C content of the sequences was relatively high, it was possible to readily identify prospective 
transcriptional (A+T rich) promoter elements. A motif corresponding to the consensus of the archaeal TATA-box-like 

25 element (C/TT-TAT/A-A) (Hain, J. et at. 1992. Elements of an archaeal promoter defined by mutational analysis. 

Nucl. Acids. Res. 20, 5423-5428) was identified upstream of nearly all genes (Fig. 2). The exceptions were the genes 
encoding MenA and DNA polymerase which are located immediately downstream of other ORFs and may therefore be 
transcribed as polycistronic mRNAs. In vivo and m vitro studies in other archaea have shown that initiation of 
transcription occurs consistently 24 to 28 bp downstream from the central T of this motif (Hain, J et at. 1992. 

30 Elements of an archaeal promoter defined by mutational analysis. Nucl Acids. Res. 20, 5423-5428; Palmer, J. R. and 

Daniels, C.J. 1995. In vivo definition of an archaeal promoter. 1 Bacteriol. 177 1844*1849). For twelve of the 
protein encoding genes, the promoter element was found 25 to 30 bp upstream of the ORF (Fig. 2), suggesting that 
transcriptional initiation occurs in close proximity to, or directly at, the translational start codon. 

A similar observation has been made for 30 of the predicted 100 strong and medium promoters from 156 

35 kbp sequence of Sutfofobus solfataricus (Sensen, C. W. et al. 1996. Organizational characteristics and information 
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content of an archaeat genome: 156 kb of sequence from Sulfolobus salfataricus P2. Molec. Microb. 22, 175-191). 
Transcription initiation at, or in close proximity to, the transiational start codons has been mapped for some genes in 
Halobacterium salinarium (Brown, J.W. eta/. 19B9. Gene structure, organization, and expression in archaebacteria. 
CRC Crit. Rev. Microb. 16, 287-337) and S. solfataricus (Klenk, H.P., et al. 1993. Nucleotide sequence, transcription 
and phylogeny of the gene encoding the superoxide dismutase of Sulfolobus acidocaldarius. Biochim. Biophys. Acta 
1174 95-98), and alternative mechanisms for initial mRNA-ribosome contact in Archaea have been hypothesized 
(Brown, J.W. et al. 1989. Gene structure, organization, and expression in archaebacteria. CRC Crit. Rev. Microb. 16, 
287-337). 

The promoters listed in Figure 2, or fragments thereof, may be used in expression vectors or expression 
systems. In one embodiment, the promoters listed in Figure 2 may be operably linked to coding regions and introduced 
into archaebacteria, and in particular Cenarchaeum symbiosum, to express the encoded gene product in the 
archaebacterial cells. 

Alternatively, the promoters listed in Figure 2 may be operably linked to coding regions and introduced into 
host cells which are not normally capable of directing transcription from archaebacterial promoters. In addition, genes 
encoding the proteins required for transcription from these promoters are also introduced into the host cells. The 
genes encoding these transcription factors may be on the same vector as the promoter from Cenarchaeum symbiosum 
or on a different vector. In some embodiments, the genes encoding these transcription factors are linked to an 
inducible promoter. Expression of the transcription factors is induced when it is desired to express the proteins which 
are operably linked to the promoter from Cenarchaeum symbiosum. 

Although this invention has been described in terms of certain preferred embodiments, other embodiments which 
will be apparent to those of ordinary skill in the art in view of the disclosure herein are also within the scope of this 
invention. Accordingly, the scope of the invention is intended to be defined only by reference to the appended claims. 
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Table 1 



Comparison of Overlapping Coding Sequences from Fosmid 101 G10 
and Fosmid G0A5 



Gene 


Functional 


i 


K Identity 


Name 1 


Category 


Nucleotide 


Amino Acid 


Hvnnthptiral M 


unknown 

ui imiu wit 


81.4 


76 6 


ZJo 


tranclatinn 
If dlldlaUUfl 






IRQ 
(DO 


iiansiduon 






uoA 1 


hpmp htOQvnthocic 


83 2 


OU.O 


nypoineuCdi u/ 


un Known 


DO.H 


O 1.4 


unr u i 


un Known 


oo.o 


09./ 


Unr UZ 


■ in ft lam 

un Known 


RQ Q 


99.1 


nnr no 

URFU3 


unknown 


P7 Q 

b/.y 


00.7 


InNM 


trontlatinn 
I fa II 51a 11 U 11 


QQ ? 




0RF04 


unknown 


87.8 


88.1 


TIM 


glycolysis 


B0.9 


83.3 


TBP 


transcription 


83.4 


86.3 


DNA polymerase 


replication/repair 


89.0 


93.9 


dCMP deaminase 


pyrimidine synthesis 


B5.7 


89.8 


RNA helicase(ATP 


translation 


B6.1 


92.2 


dependent) 








PPI 


chaperone 


88.4 


92.5 


Hypothetical 03 


unknown 


91.5 


92.4 


MenA 


menaquinone biosynthesis 


86 


89.4 


0RF 05 


unknown 


87.5 


90.6 


Methylase 


restriction/modification 


86.4 


87.5 



1 Hypothetical: open reading frame (QRF) with similarity to proteins of unknown function from the databases. 
ORF « open reading frame identified by similarity between both fosrnids, including upstream promoter sequence; 
GSAT = glutamate semialdehyde aminotransferase; TIM « triose phosphate isomerase; TBP - TATA box-binding 
protein; PPI - peptidylprolyl cis/trans isomerase. 
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Table 2 

Analysis of Polymorphism at Four Distinct Loci in Different Fosmids 



Fosmid 16SRNA'* 16S-23S 16SGSAT* 3 DNAPol' 5 







spacer* 2 


Haelll 


Rssl 


Haelll 


Avail 


101 G10 


A 


A 


A 


A 


A 


A 


60A5 


B 


B 


B 


B 


B 


B 


1SA5 


B 


B 




-- 


b 


b 


43H4 


A 








A 


A 


60H6 


A 


A 






alb 


B 


69H2 


A 








A 


A 


87F4 


B 








b 


a/b 


C1H5 


A 


A 


A 


A 






C4H1 


A 


A 


A 


A 






C4H9 


A 


A 


A 


A 


A 


B 


C7D4 


A 


A 


A 


A 


A 


A 


C8B8 


B 


B 


B 


B 


B 


b 


C15A3 


A 


A 


A 


A 






C17D2 


B 




b 


B 


B 


b 


C20B5 


A 


A 


a 


a/b 







*1: partial sequence (101G10 through 87F4) or RFLP analysis (C1H5 through C20BS). 
*2: partial sequence. 

*3: RFLP analysis of PCR products; A/B: identical pattern to either 1 01 G10 («A| or 60A5 (-B); a,b: similar pattern to 
either A or B (see materials and methods). Fosmids C1H5, C4H1, C15A3 and C20B5 did not yield PCR products with 
polymerase-specific primers. 

The first seven fosmids were isolated from a first library, the last 8 fosmids (prefix C) are from a second library. 
- « not determined. 
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Table 3 

Detection of C. symbiosium Variants in Natural Populations of A. mexkana 



A. mexicana Individual or 


Variation in 16S 


Variations in 23S rflNA 


Isolated DNA Source* 


rDNA Positions" 


Hybridization 




175 


183.7 


Variant Type A 


Variant Type B 


fosmid 101610 from $12 


U 


U 




• 


fosmid 60A5 from si 2 


C 


C 


- 


+ 


s12 


Y 


Y 


+ 


+ 


si 


... 


._ 


+ 




s2 





— 


+ 




s3 


Y 


Y 


+ 


+ 


s4 


U 


U 


+ 


w 


s5 


Y 


Y 


... 


... 


s6 


Y 


Y 


+ 


+ 


s7 


... 


... 


+ 


w 


s8 


Y 


Y 


+ 


+ 


s9 


Y 


Y 


+ 


w 


slO 


... 





+ 


+ 


sll 


Y 


Y 




+ 


s13 


... 


... 


+ 


+ 


s14 


... 


... 


+ 


w 


sIG 


... 





+ 


+ 


s17 


... 


... 




w 


s18 


Y 


Y 




w 


si 9 


... 


... 




+ 


s20 





... 


+ 


+ 


s21 





... 


+ 


+ 


s22 





... 




+ 


s23 


... 


... 


+ 




s24 





... 


+ 


+ 


s25 





... 




+ 


s2G 


... 


... 


+ 


+ 


s27 


— 




+ 


+ 


s28 


— 


- 


+ 


+ 


s20 




... 




+ 


s30 








+ 


hsl 






+ 


+ 


hs2 






+ 


+ 


hs3 


Y 


Y 


+ 


w 


hs4 


Y 


Y 


+ 


w 


hs5 


Y 


Y 


+ 


+ 


hhl 






w 


w 


hh2 


Y 


Y 


+ 


+ 


hh3 


Y 


Y 


+ 


+ 


Aql 


Y 


Y 






Aq2 


Y 


Y 






Aq3 






+ 


+ 



*s - Naples Reef; hs « Haskle; hh ~ Hermit Hole; Aq ° captive sponge. 
•*y B direct sequence of PCR product yields C and li at the same position. 
- = not determined; w ° weakly positive. 
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WHAT IS CLAIMED IS : 

1. An isolated, purified, or enriched nucleic acid comprising a sequence selected from the group 
consisting of SEQ ID NO: 1 and SEQ ID NO: 2, the sequences complementary to SEQ ID NO: 1 and SEQ ID NO: 2, 
fragments comprising at least 10 consecutive nucleotides of SEQ ID NO: 1 and SEQ ID NO: 2, and fragments 
comprising at least 10 consecutive nucleotides of the sequences complementary to SEQ ID NO: 1 and SEQ ID NO: 2. 

2. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 1 
under conditions of high stringency. 

3. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 1 
under conditions of moderate stringency. 

4. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 1 
under conditions of low stringency. 

5. An isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of 
Claim 1 as determined by analysis with BLASTN version 2.0 with the default parameters. 

6. An isolated, purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of 
Claim 1 as determined by analysis with BLASTN version 2.0 with the default parameters. 

7. An isolated, purified, or enriched nucleic acid comprising a sequence selected from the group 
consisting of SEQ ID NOs: 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79 and the 
sequences complementary thereto. 

8. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 7 
under conditions of high stringency. 

9. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 7 
under conditions of moderate stringency. 

10. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 7 
under conditions of low stringency. 

11. An isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of 
Claim 7 as determined by analysis with BLASTN version 2.0 with the default parameters. 

12. An isolated, purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of 
Claim 7 as determined by analysis with BLASTN version 2.0 with the default parameters. 

13. An isolated, purified, or enriched nucleic acid comprising at least 10 consecutive bases of a 
sequence selected from the group consisting of SEQ ID NOs: 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 
65, 67, 71, 75, 79 and the sequences complementary thereto. 

14. An isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of 
Claim 13 as determined by analysis with BLASTN version 2.0 with the default parameters. 
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15. An isolated, purified or enriched nucleic acid comprising a sequence selected from the group 
consisting of SEQ ID NOs: 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73, 77 and the sequences 
complementary thereto. 

16. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 15 
under conditions of high stringency. 

17. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 15 
under conditions of moderate stringency. 

1 8. An isolated, purified, or enriched nucleic acid capable of hybridizing to the nucleic acid of Claim 15 
under conditions of low stringency. 

19. An isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of 
Claim 15 as determined by analysis with BLASTN version 2.0 with the default parameters. 

20. An isolated, purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of 
Claim 15 as determined by analysis with BLASTN version 2.0 with the default parameters. 

21. An isolated, purified, or enriched nucleic acid comprising at least 10 consecutive bases of a 
sequence selected from the group consisting of SEQ ID NOs: 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 
55, 69, 73, 77 and the sequences complementary thereto. 

22. An isolated, purified, or enriched nucleic acid having at least 70% homology to the nucleic acid of 
Claim 21 as determined by analysis with BLASTN version 2.0 with the default parameters. 

23. An isolated, purified, or enriched nucleic acid having at least 99% homology to the nucleic acid of 
Claim 21 as determined by analysis with BLASTN version 2.0 with the default parameters. 

24. An isolated, purified, or enriched nucleic acid encoding a polypeptide having a sequence selected 
from the group consisting of SEQ ID NOs: 6, 10, 14, 26, 2B, 3D, 32, 34, 3B, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 
and 80. 

25. An isolated, purified, or enriched nucleic acid encoding a polypeptide comprising at least 10 
consecutive amino acids of a polypeptide having a sequence selected from the group consisting of SEQ ID NOs: 6, 10, 
14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68. 72, 76, and 80. 

26. An isolated, purified, or enriched nucleic acid encoding a polypeptide having a sequence selected 
from the group consisting of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 
78. 

27. An isolated, purified, or enriched nucleic acid encoding a polypeptide comprising at least 10 
consecutive amino acids of a polypeptide having a sequence selected from the group consisting of SEQ ID NOs: 4, 8, 
12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

28. An isolated or purified polypeptide comprising a sequence selected from the group consisting of 
SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80. 



52 



WO 00/18909 



PCT/US99/22752 



29. An isolated or purified polypeptide comprising at least 10 consecutive amino acids of the 
polypeptides of Claim 28. 

30. An isolated or purified polypeptide having at least 70% homology to the polypeptide of Claim 28 as 
determined by analysis with FASTA version 3.0t78 with the default parameters. 

31 . An isolated or purified polypeptide having at least 99% homology to the polypeptide of Claim 28 as 
determined by analysis with FASTA version 3.0t78 with the default parameters. 

32. An isolated or purified polypeptide having at least 70% homology to the polypeptide of Claim 29 as 
determined by analysis with FASTA version 3.0t78 with the default parameters. 

33. An isolated or purified polypeptide having at least 99% homology to the polypeptide of Claim 29 as 
determined by analysis with FASTA version 3.0t78 with the default parameters. 

34. An isolated or purified polypeptide comprising a sequence selected from the group consisting of 
SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

35. An isolated or purified polypeptide comprising at least 10 consecutive amino acids of the 
polypeptides of Claim 34. 

36. An isolated or purified polypeptide having at least 70% homology to the polypeptides of Claim 34 
as determined by analysis with FASTA version 3.0t78 with the default parameters. 

37. An isolated or purified polypeptide having at least 99% homology to the polypeptides of Claim 34 
as determined by analysis with FASTA version 3.0t78 with the default parameters. 

38. An isolated or purified polypeptide having at least 70% homology to the polypeptides of Claim 35 
as determined by analysis with FASTA version 3.0t78 with the default parameters. 

39. An isolated or purified polypeptide having at least 99% homology to the polypeptides of Claim 35 
as determined by analysis with FASTA version 3.0t78 with the default parameters. 

40. An isolated or purified antibody capable of specifically binding to a polypeptide comprising a 
sequence selected from the group consisting of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 

64, 66, 68, 72, 76, and 80. 

41. An isolated or purified antibody capable of specifically binding to a polypeptide comprising at least 
10 consecutive amino acids of one of the polypeptides of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 
60, 62, 64, 66, 68, 72, 76, Bid 80. 

42. An isolated or purified antibody capable of specifically binding to a polypeptide having a sequence 
selected from the group consisting of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 
74, and 78. 

43. An isolated or purified antibody capable of specifically binding to a polypeptide comprising at least 
10 consecutive amino acids of one of the polypeptides of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 
50, 52, 54, 56, 70, 74, and 78. 
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44. A method of making a polypeptide having a sequence selected from the group consisting of SEQ ID 
NOs: 6 r 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, and 80 comprising introducing a 
nucleic acid encoding said polypeptide, said nucleic acid being operably linked to a promoter, into a host cell. 

45. A method of making a polypeptide comprising at least 10 amino acids of a sequence selected from 
the group consisting of the sequences of SEQ ID NOs: 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 
68, 72, 76, and 80 comprising introducing a nucleic acid encoding said polypeptide, said nucleic acid being operably 
linked to a promoter, into a host cell. 

46. A method of making a polypeptide having a sequence selected from the group consisting of SEQ ID 
NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 comprising introducing a nucleic 
acid encoding said polypeptide, said nucleic acid being operably linked to a promoter, into a host cell. 

47. A method of making a polypeptide comprising at least 10 amino acids of a sequence selected from 
the group consisting of the sequences of SEQ ID NOs: 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 
70, 74, and 78 comprising introducing a nucleic acid encoding said polypeptide, said nucleic acid being operably linked 
to a promoter, into a host cell. 

48. A method of generating a variant comprising: 

obtaining a nucleic acid comprising a sequence selected from the group consisting of SEQ ID NOs. 1, 2, 5, 9, 

13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 

49, 51, 53, 55, 69, 73 and 77, the sequences complementary to the sequences of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 
29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53. 
55, 69, 73 and 77 , fragments comprising at least 30 consecutive nucleotides of SEQ ID NOs. 1, 2, 5, 9, 13, 25, 27, 
29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 
55, 69, 73 and 77, and fragments comprising at least 30 consecutive nucleotides of the sequences complementary to 
SEQ ID NOS. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 
21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77; and 

changing one or more nucleotides in said sequence to another nucleotide, deleting one or more nucleotides in 
said sequence, or adding one or more nucleotides to said sequence. 

49. The method of Claim 48, further comprising the step of testing the enzymatic properties of a 
translation product of said variant. 

50 A computer readable medium having stored thereon a sequence selected from the group consisting of a 
nucleic acid code of SEQ1D NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71. 75, 79, 3, 
7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 and a polypeptide code of SEQ ID NOs. 6, 10, 

14, 26, 28, 30, 32, 34, 38, 42, 46, 5B, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 

50, 52, 54, 56, 70, 74, and 78. 

51 A computer system comprising a processor and a data storage device wherein said data storage device 
has stored thereon a sequence selected from the group consisting of a nucleic acid code of SEQID NOs. 1, 2, 5, 9, 13, 25, 
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27, 29, 31, 33, 37, 41, 45, 57, 59, 61, 63, 65, 67, 71, 75, 79, 3, 7, 1 1, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 
53, 55, 69, 73 and 77 and a polypeptide code of SEQ ID NOs. 6, 10. 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 
64, 66, 68, 72, 76, 80, 4, 8, 12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78. 

52 ThB computer system of Claim 51 further comprising a sequence comparer and a data storage device 
having reference sequences stored thereon. 

53 The computer system of Claim 52 wherein said sequence comparer comprises a computer program 
which indicates polymorphisms. 

54 The computer system of Claim 51 further comprising an identifier which identifies features in said 

sequence. 

55 A method for comparing a first sequence to a reference sequence wherein said first sequence is 
selected from the group consisting of a nucleic acid code of SEQID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 
57, 59, 61, 63, 65, 67, 71, 75 f 79, 3, 7, 11, 15, 17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 and a 
polypeptide code of SEQ ID NOs. 6, 10, 14, 26, 28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 
12, 16, 18, 20, 22, 24, 36, 40, 44, 48, 50, 52, 54, 56, 70, 74, and 78 comprising the steps of: 

reading said first sequence and said reference sequence through use of a computer program which compares 
sequences; and 

determining differences between said first sequence and said reference sequence with said computer program. 

56 The method of Claim 55, wherein said step of determining differences between the first sequence and 
the reference sequence comprises identifying polymorphisms. 

57 A method for identifying a feature in a sequence selected from the group consisting of a nucleic acid 
code of SEQID NOs. 1, 2, 5, 9, 13, 25, 27, 29, 31, 33, 37, 41, 45, 57, 59, 6t ( 63, 65, 67, 71, 75, 79, 3, 7, 11, 15, 
17, 19, 21, 23, 35, 39, 43, 47, 49, 51, 53, 55, 69, 73 and 77 and a polypeptide code of SEQ ID NOs. 6, 10, 14, 26, 

28, 30, 32, 34, 38, 42, 46, 58, 60, 62, 64, 66, 68, 72, 76, 80, 4, 8, 1 2, 1 6, 1 8, 20, 22, 24, 36, 40, 44, 48, 50, 52, 
54, 56, 70, 74, and 78 comprising the steps of: 

reading said sequence through the use of a computer program which identifies features in sequences; and 
identifying features in said sequence with said computer program. 
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SEQUENCE LISTING 

<110> Di versa Corporation 
Swanson, Ronald V. 
Feldman, Robert A. 
Schleper, Christa 

<120> NUCLEIC ACIDS AND PROTEINS FROM CENARCHAEUM SYMBIOSUM 

<130> DCORP . 002VPC 

<150> 60/102,294 
<151> 1998-09-29 

<160> 123 

<170> FastSEQ for Windows Version 3.0 

<210> 1 
<211> 32998 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 
<221> CDS 

<222> (7604) . . . (8908) 
<221> CDS 

<222> (8961) . . . (9767) 
<221> CDS 

<222> (10545) . . . (10922) 
<221> CDS 

<222> (13944) ... (14612) 
<221> CDS 

<222> (18638) ... (20149) 
<221> CDS 

<222> (20554) . . . (20955) 
<221> CDS 

<222> (20956) . . . (21834) 
<221> CDS 

<222> (25151) . . . (26377) 
<221> CDS 

<222> (27535) . . . (28002) 
<221> CDS 

<222> (28065) . . . (29483) 
<400> 1 

gatccttgac ctctgcgctt attgcagcca tggactgacc ggccgtgcgg ggctaaataa 60 
agctgaggcg ccgcctgcag gctctgctca gccgttgatt atacagtact cgcactcgca 120 
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ggtcttgctt tcgtcatctt ttgccctgca cttggcgtgc tggcccgact ggcactggac 180 

gcagatctcg tctatgttgt ctatcgaggt gtactttttg gactggtcaa agccgaagcc 240 

gccgtccttt ggtccaacca tgaagcggtc agggccgtta tgccttaatt atctttgccg 300 

tcggggcggg gccgcttctg ccggcgggag ccccccttga ggccgctccc cggctcgttg 360 

tcccacaggt grcatgctgcc cctgatcata aacgagccga ctatgattgc tacaagcccg 420 

cccactatca gtataaacag cctagccacg ccccccattc tgcccatgcg tgtaatatgc 480 

tgcacctgta aacaaacatt gcgcggggca tgggccgtcc ggacagacag aactgcccat 54 0 

gagacaggtg cctgcgggcc ggtaagctac attaatttat caccccccac gggcgggccc 600 

catgagcagg accaagagaa taatcatctg ggcatccata ctgggcgggg ggataatcta 660 

ctttctcgtc cagggcgaga ttgccagaaa tgtattgtcc tgagaacaga agccatgcag 720 

caactgcccg atcgtttttc aggtacctac tgggcttttt ggagttcgtt gatcaaaagc 780 

ggtacaggta ttctatacat gccagtcttg gctggaaaaa taaattgaag atcggcggat 840 

cctatctacg agcgcctgcc tgcttttcac ttgactagcc ggagtacttc gtctgcaatt 900 

tctgtatagc taggcattta tgtagttgag atacatgtcc gcgggatccg tttcatagtc 960 

tgaatcaaac accggatcat tccttctctt taattcctta aatgcctgat gcagttcaag 1020 

caggacggtt acatcatcgc gtttgattat ccgttctgtt gtttcagctt tttgctcatt 1080 

ctcatccatg attataggct acggtatttg actaaaaagg tttccatctt catgagtcgt 1140 

gtgtttgccg tggaattacc taccggggga acataaaaaa atgagtcata aacagcgcac 1200 

tgcatccacc gtggacccat gagacaacga gccggcagcg gtgcacagca ccgagtacaa 1260 

ccccgcaatc gtcttatcca acgacctgcc ctatgaaaac tccagacgga tctcttccgg 1320 

gcatccagta cccatgtatg tgagaattct gactatctta ccgtggtgtt ggcagtgtct 13 80 

caacgacggt attaccgagg tgattcatga taattttgtt gactggagta tactgaaaca 1440 

tggtatttgc cgcatgaaat actgacagca ccagatcttg aaattcttgt tcatcttcca 1500 

gatcagttac ttttaagcca atccttttac attcttctct cgatatgcgc cttccatgac 1560 

tgtgatattt tttagaggaa gacattattt ctgatatttt ttttgacttt tttaccgctg 1620 

cagactcgtt ctcaaacatg tatctagcca gccatttttg tacaagttcc acacttagct 1680 

tctggctgct aatgcatttt tgaattagcc cgggaggata ttgccctaac agtggaagcc 1740 

atgcgccgag cctccccggg tgtttttctg acaccgactg tgcttcttga aactcgctaa 1800 

ttagaaactg tgcagacatt atgtgcatgc cggtcctggt tggaataata aattgggggt 1860 

cggtgggacc tatcgatgag tgtttaccca ttaccaggca attcgatgag catgcaagca 1920 

tcgcagctgc cgacatcgcg gcatatggga taatgatccg gacattttta aattttgcac 1980 

gtatgtatga gacgattgct tcggtggact cgacggagcc ccccggactg tggagtatta 2040 

aatctaattt cttagtcttt aaatcacgca tcatcctcat aaatccatac aggtcaccat 2100 

ttgttatgag agcttcatta gacgtatgcg cttcgtccgt tatccagttg gtcgcataca 2160 

gtattgtatc cctccccgaa tactgttgca attttgaaag atagtcgtga agtactacac 2220 

caggtgcctg ctcaccgact gactccatta gtcgaagtac gtcgtttgct atttctgcgt 2280 

aactaggcat ttatgtagtt gaaatgacta cccgcgggaa tcataccata gtctgtgtcg 2340 

tatgacttgc cttttttctt catcaatttc tcatattcct catgcagttc gagcaggatg 2400 

gccatgtcac tacgtttgtt cgtctgtttt gttgtctcgg gcttttggtc catattatta 2460 

tccatgctag taaaggacta tgttccttta aaaaggttcg tgattttaat ttccaagtgt 252 0 

ttgcctcgca atttcctcca aggcacatga aaaacgggcc acaggcagag cacagcatcc 2580 

gctggggacc catgaaataa gcccccggcg gtgcacagca tccgctgggg gctcaataaa 264 0 

aaaatgagtc atcatgcata gtctctatgt aaatggctga accggtgttt tggtcgatta 2700 

gtaaaggctg gctcaccact cgccgaagct tgtgggatac accaccttcc tatcaacgca 2760 

gtcttcttct gcgaaccttc atccgaagaa ggaatatctt gtctcgggat aggattcgtg 2820 

cttagatgct ttcagcactt agcctagatg gcttagctgc ccggcctgcc ctgtcggaca 2880 

accggtagac cagtggccac gcctctctgt tcctctcgta ctaagagcga cttcccctca 2 94 0 

gatattcgcg cttccatcag gcagaggccg acctgtctca cgacggtcta aacccagctc 3 000 

atgttccctt ttaataggcg agcagcctca cccttggccc ctgctgcagg accaggatag 3060 

gaaaagccga catcgaggta ccaaaccgcg gggtcgatag gagctctcgc ccgcgacgag 3120 

cctgttatcc ctggggtaat ttttctgtca cctccgggcc ccaatagtgg gcacacgaag 3180 

gatcgctaag ccagactttc gtctatgaat tccgtgcgtt tggaaatcca ttcagtctag 324 0 

tttttggctt tgccctcttc agcggatttc tgacccgctt gaactaaact ttgggcccct 3300 

ttgatatctt ttcaaagggg tgccgcccca gccgaactgc ccacctgcac atgtccccgg 3360 

tcttcaccgg gtaagtggca ctgcaggaaa tgtctggtgt tacatcggcg tcccctgacg 3420 

tcccaaagaa cgccaggaaa tgactcccag atacgctatg cactccctgc tataccacaa 3480 

gcacaagctg cagtaaaact ccacggggtc ttctctcccc gatggaagat gatggactgt 3540 

tcgtccacct tatgtggctt caccgggttg taggcgggga cagtggggct ctcgttgttc 3600 
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cattcatgca cgtcggaact tacccgacaa ggcatttggc taccttaaga gagtcagagt 3660 

tactcccggc gttaaccggt ccttagctcg gttgaaccca agttttagat accggcaccg 3720 

gccaggattc agcgactata catacccttt cgggctagca gtcgcctgtg tttttattaa 3780 

acagtcgaaa cccccttgtc actgcaacct gctgccgcca ttcctcatga cagctgcagg 3840 

catcccttat acctaagcta caggactaat ttgccgaatt ccctcgccat acggtatacc 3900 

cgtagcacct tagtttacta aaccagcgca cctgtgtcgg atctgggtac gaacttgcag 3960 

tttgctagcc gcacggtctt tcatggtctc ctggaatcgg gaaaactctg ctaacgcaaa 4020 

gccactcccg cctcgggcct gttctcgtca ttacgacact cccaggccct cgaacggttc 4080 

gacacgacga cggtcatgtt ccccctatcc ggaagcgaac catgcggttc aaacgctccc 4140 

tgcaaggtac cagaatatta actggtttcc cattcggact actctgttga ggcagtcctt 4200 

aggatcgact aactccaggc tgacgacgca ttgcctggaa acccttgcgc ttacggtggt 4260 

gcggattctc accgcactat gctgttactg ccaccaggat ctgcaataga aatcggtcca 4320 

caggacgtca ccgccctgct tcgtcccaat cactacgcca acctaccacg gtgcacctat 4380 

cacggtgcac gtctggagta tcggtactct gctttagccc cgtccgtttt tgtggcgccc 4440 

tcgctcggca ggtaagttgt tacacacttt ttgaaggata gctacttctg agcttacctc 4500 

cctgctgtct tggcgatgac acgcactttg gcttgacact tagcagaaat ttggggacct 4560 

taactccagt ctgggttaaa cccctctcgg tcgtgaacct tacgtcacac gaacccgtgt 4620 

ccatgcttct gcgatgtgta tccgttcgga gtttgaatgg atggtgagga atctcttccc 4680 

cgcgccaccc tatcagtgct ctaccggaaa caccatctcc acatagcacg ccctgcgaga 4740 

cgcttcggtt ggaactagca agcgccagtc tagattggtt tttgacccct attcccaagt 4800 

cacacaaacg agttgcacgt cagaactgct gcagacctcc agtgggcttt cgcccacctt 4860 

catcttgctc aggaatagat cgactggctt ctagccttac cgccatgact taacgcactt 4920 

tcacacgctt ctcctcacaa tgctgcgaga attcggtttc ccttcggcta cgcctttcta 4980 

ggcttaacct cgccatgaca gcaagctccc tggcccgtgt ttcgagacgg aacgcatgac 5040 

actgacgaca tgagctccgg actttcagct ccattgctgg aacctccggt ccgaaaaaat 5100 

cgtctttcat gccatgcacg tctgtaagca ataggtttca tgcacttttc accccccttc 5160 

cggggtactt ttcagctttc cctcacggta ctagtacact atcggtcttg agagatattt 5220 

agcctttgat gctactttca ccaatcttcg ctgcccactg ccaaggacaa ctactcgggt 528 0 

gctggccctg ccccattcca cttcgtctag gggggtatca ccctctaagc cggaacattt 5340 

cagaacactt caactatttc ctggggccat tgcgccgcac caaaacacca catctcggcc 5400 

gcgttaccgc ggcagattca gtttgggctc tttccttttc gatcgcctct acttgggaaa 5460 

tctctattga tttctcttcc tcgtggtact aagatgcttc aattcccacg gttcgacctc 5520 

cgcttgcgcg gagtatacag gattcctatt cggaaatctc gggatcaacg ggtgcgtgca 5580 

cctaccccga gcttatcgca gcttgccacg tccttcttct ctcctcaagc ctagcaatcc 5640 

tcctattgcc gtctttacac cggcatattc agccacatat tacacgacta tgcatgatga 5700 

tcatcgcagt ccccagggga gggggccgct acatccttca tacaccactt gcgtggtgca 5760 

ttgcaccatg caaagatcat gtgcattctg ttcaaaccag tttctaagga ggtgatccga 5820 

ccgcaggttc ccctacggtc accttgttac gacttttccc ttgtcgctta cctcaagttc 5880 

gataacgcca attagacgtc acctcactaa aagcaaactt caatgaaacg acgggcggtg 5940 

tgtgcaagga gcagggacgt attcactgcg cggtaatgac gcgcggttac tagggattcc 6000 

agattcgtga gggcgagttg cagccctcag tcataactgt ggtagcgttt ggggattacc 6060 

tcctcctttc ggatatggaa cccattgtca ctaccattgc agcccgcgtg tggccccaga 6120 

gtttcggggc atactgacct gccgtggccc tttccttcct ccgcattaac tgcggcggtc 6180 

ccgctaattc gccccactgc tccggagagc aatggtggca actagaggca aggatctcgc 6240 

tcgttacctg acttaacagg acatctcacg gcacgagctg gcgacggcca tgcaccacct 6300 

ctcagcttgt ctggtagagt cttcagcttg accttcacac tgctgtctct ccgggtaaga 6360 

tttctggcgt tgactccaat tgaaccgcag gcttcacccc ttgtggtgct cccccgccaa 642 0 

ttcctttaag tatcatactt gcgtacgtac ttcccaggcg gcaaacttaa cggcttccct 6480 

gcggcactgc actggctctt acgccaatgc atcactgagt ttgcattgtt tacagctggg 6540 

actacccggg tatctaatcc ggtttgctcc cccagctttc atccctcacc gtcggacgtg 6600 

ttctagtaga ccgccttcgc cacagggggt catcgataga tcagaggatt ttaccccttc 6660 

ctaccgagta ccgtctacct ctcccactcc ctagccgtgc agtatttccg gcagcctatg 6720 

cgttgagcgc atagatttaa ccgaaaactt acacggcagg ctacggatgc tttaggccca 6780 

ataatcctcc tgaccacttg aggtgctggt tttaccgcgg cggctgacac cagaacttgc 6840 

ccacccctta ttcgccggtg gttttaagac cggtaaaaga tttctttagc agaaaacact 6900 

cggattaacc ttgtcgtgct ttcgcacatt gcaaagtttt ctcgcctgct gcgccccata 6960 

gggcctgggt ccgtgtctca gtacccatct ccgggcctct cctctcagag cccgtatctg 7 020 

ttatagcctt ggtgggccat tacctcacca acaagctgat agaccgcagt cccatcctac 7 080 
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ggcgataaat catttgggcc acaaaccatt ccaggcatag tggcctatcg gatattattc 7140 

tcagtttccc gaggttatcc ccgtccatag gttagattga ctacgtgtta ctgagccgtc 7200 

tgccttgtat tgctacaatg actcgcatgg cttagtatca atccgatagc agtcaggtcc 7260 

ggcaggatca accggattca taattggatt attttttttt tgttaagtac gcttgtactt 7320 

ttggaattga acagaatgca cataatcttc acatctcaga tatgaccctt cgatcatacc 7380 

ctcattctgt gtgcgtaact ggaggccagc gaatcacaat atggtacaat accatgcatt 7440 

catcgcaagc gccgctcttg cgtcacgtac gatcggatcg gcccgtccat gggcatataa 7500 

accatcgccg atttccgccc ccggcagccc cgatcagggg ccggatctgc ctgtatgatg 7560 

gcgatccgcc ctgattaaat tatgggggga gcggcctgct gccgtggatc tggaacgcga 7620 

gtacagggca aagaccggcg gctcggcccg gatctttgcc aggtcgaaaa agtaccacgt 7680 

cggcggggtc agccacaaca taaggttcta cgagccgtat ccgtttgtga caaggtccgc 7740 

gagcggcaag cacctcgtcg acgtggacgg gaacaagtat gtagactact ggatggggca 7800 

ctggagcctg atactggggc acgcgccggc gccagtcagg tcggcagtag aggggcagct 7860 

tcgccgcggc tggatccacg ggaccgtcaa cgagcagacg atgaatctct cggagataat 7920 

acgcggcgcg gtaagcgtgg cagaaaagac aaggtacgtc acgtcgggga cggaggccgt 7980 

catgtatgcg gcaaggctgg cgcgcgcgca tacgggcaga aaaataatag caaaggcgga 804 0 

cggcggctgg cacgggtacg cgtcggggct gctcaagtcg gtcaactggc cgtatgatgt 8100 

gcccgagagc ggggggctcg tcgacgaaga gcactctata tccattccgt acaacgatct 8160 

tgaaggttcc ctggatgttc ttgggcgcgc aggcgacgac ttggcatgcg tgataatcga 8220 

gccgctgctg ggcggcggcg gctgcatacc ggcggatgag gactatctgc gcggcataca 8280 

ggagtttgtg cattcaaggg gcgcgctgct tgtcctcgac gagatagtga cagggttccg 8340 

gtttaggttt ggctgcgcgt atgctgcagc agggctggac cccgatatag tggcgctcgg 8400 

caagatagtc gggggcggat tccccatagg ggtgatatgc ggcaaggacg aggtgatgga 8460 

aatctccaac actatatcgc atgcaaagtc cgacagggcg tacatcggcg gcggcacatt 8520 

ctctgcaaac cccgccacga tgacagcggg cgcggcagcg ctcggggagc tcaaaaagag 8580 

aaagggcaca atatacccga ggataaactc catgggggac gacgcaaggg acaagctctc 8640 

aaagatattt gggaacaggg tatccgtgac cggaaggggc tcgctgttca tgactcactt 8700 

tgttcaagat ggcgccggca gggtctcaaa tgctgcagat gcggcagcct gcgatgttga 8760 

gctgctgcac aggtaccacc tggacatgat cacccgggac ggcatattct ttctgccggg 8820 

caagctgggg gccatatcgg cggcgcactc aaaggccgac ctcaagacca tgtattccgc 8880 

atcagagcgc tttgcagaag gcctatgagg tatagcgccg gaggaaactt tgattatacg 894 0 

ggcgtgctgc cccggggccc atgatactct tcggcaagag cgaccccgcc gagctggtgc 9000 

gccaggcgga cctcctgtgc agcaagaacc agttcagggc ggcaataggc ctgtacggga 9060 

aaatcctcaa ggacgacccg cagaacaggg gcgtcctgca caaaaagggg ctggcccaga 9120 

acagggcaaa aaagtactct gatgcgatca cgtgctttga ccggctgctc gagcttgaca 9180 

acaaggacgc gcccgcgtac aacaacaagg ccatagccca ggccgagctc ggagacacgg 9240 

catccgcgct ggaaaactac ggcagggcca tcgaggccga cccgcggtac gcgccggcgc 9300 

gcttcaacag ggccgtgctg ctcgacaggc tgggcgagca tgaggaggcg ctgccggacc 9360 

tcgacagggc agccgagctg gaccgacgca agccgaaccc gaggttctac aaggggatag 9420 

tgctcggcaa gatgggcagg cacgaagagg cgctggcctg cttcaagggc gtgtgcaaga 9480 

ggcatcccgg ccacgccgac tcacagttcc acgtggggat agagcttacc gagcttggca 9540 

ggcacgccga ggccctcggg gagcttgcat cactgcccgc ggagcaccgc gagaacgcca 9600 

atgtattgta tgccagggcg cgcagcctct cgggccttgg cagggaggac gaatccatag 9660 

cgcacctgca aaaggcggcc aaaaaagatt ccaagacgat aaaaaagtgg gcccgcgcag 9720 

aaaaggcctt tgacggaata cgggacgatc ccggttcaaa aagatagccg gctagaggat 9780 

cttttttctt gccgcgtcaa tccgcatcat gcggaccttt tttttgggcc ccacaagtcg 9840 

cgattcatag actggtacat agaccacctc caccgccttt gcggcaaact cctcccgcag 9900 

gtcgcgcatg ccgtcaggcg ggggcccgcg cagcttctct tttagttttg agagcgcctc 9960 

ttctgtctcc acctcggggc tccgcacatt ctctgacgca tcgagtatcc tccgcgggta 10020 

cggctccacc gcgccgggcc ccgtcttgta gggaaagtcc gtctcgccgc cgtgccggtc 10080 

aaggcacatc atcccttctg attccgcaaa gacatgctct tctagctcga ggtcgaccct 10140 

gttcttgccg agccctgccg agagcgtctt gtgtatgcgc gacttggacc ttatgggaaa 10200 

gacgccgtcg cctagcacca cctcgatcac gttctggtcc accttgatcg ggtgaaccgc 10260 

ctttctgaaa aaatccgcag agtacctggc ggagacccgg atcagcgcct cgtggaccag 1032 0 

ctttacagaa tgcacatgga cgtcttcttt ccgcggggcc ctcataaggg ccctaaaggc 10380 

acccgtcttc tttgcctcta tcatggcccg agccgactcc tcagtcatgg cgttccgcag 1044 0 

gaccgccgtc ctggtctttc cagtcatccc ctgccgcacc ccgcataagg catactatac 10500 

aacgcaaggc aaggtaataa tagcctgccg tctgtaacgg ccgtatgagg tcggagggca 10560 
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ggcccggata catcgaaaag ttcctaaaga gggcggacaa ggcgatagac aatgcagtcg 10620 

agcagggcgt caagagggca gacgagatac tagatgacgc agtcgagctc ggcaagatca 10680 

ccgtgggcga ggcgcaaaaa agaagcgatg tgctgctcaa gcaggccgag cgggagagca 10740 

agcggctcaa gtcaaggggc gccaaaaagc tcgaaaaggg cataggggcg gcaaaaaaga 10800 

tggcagccgg caagggcgac gcgctagaga ccctggcaaa gctcggcgag ctgagaaagg 10860 

cggggatcat aacggagaag gagtttcgcg ccaagaaaaa gaagcttctc gcggagatct 10920 

gacttgaagc cgctagacta tacccgggac ggctcgataa aggaggtcac aaagaggtgg 10980 

tttataggca cgccgtccct tgtcgacctt gcaggcgagc tcggcatatc tgagagcaac 11040 

atattccacg tgacatttcc cgacggcgca aagaccaccc tgcatacgca cgagggcggg 11100 

cagctgctca tagtgacctc gggcaccggc agcatgtcaa tatttgaaaa gaccggcgga 11160 

ggcgaggcgg aatttgcaat aaaagagaca gacaggatac cgctaaagca gggcagcatc 11220 

cagtacatac ctgccggcgt gctcacgtgc acggcgcaac agacggcacc accctgtccc 11280 

atatagcggt aaactacccg tcgccatcgg gaaaggagcc gtatacatta tggtatgaat 1134 0 

ccgactttgc cagccgggtc accggcgtgc tgtaaattat attatttgag cctctccagt 11400 

atcgacaggc ttacaaggtt ggtcatcgtt atccccttgc ggatcacttc ccttctagtc 11460 

ttttcgcagt acttgttgac gctctggtgg ctcttttcgc tggatacctc gagcaccaca 11520 

atgttcccgc tgtacgggaa ggcaaacttt ggcggaatcc tcgagcatac atccatgcgc 11580 

cctattcccg ccctgcctat cttctccttt cttgccacta ccgacgccac ctcgcgtatg 11640 

tcctcctcgg aatctccgta ttccagatag tacatggata catagctcat cccgggggat 11700 

tccctttcga atatctcctc gtccatgctg aataaatagg aagggccgcc cccgcggccc 11760 

tccaccgcct ttatcatttt ggggccgttt ttgaaccttg ccagcaggta gtggctggcc 11820 

tgctgctcaa agtacggctt gcttttggac gagaaagtgg cggtcacaaa gtacatctcg 11880 

cctacgttcc tcgacgagga ccattcctcc ttgagctcta tcgaggcatg gtcgtgcgtg 11940 

tatggcgtgc aggcatatcc ccccggggag gcctccgtct tggacatgca tgggatccgc 12000 

ggaaccggtt aatatctagt tccatgccgc cttggggggc gggggccccg cctgtggccg 12060 

gccccggggc aggcgtgcgt ggatccatgc gatagttatt taaaactagg atgccgatca 12120 

cggatcgtcc caagctagct cagcctggta gagcttccgg ctgtagatgt cggccttggc 12180 

tgaccgtata acagcatatc aggcatacag agaccgggtt gtcgaaggtt caattccttc 12240 

gcttgggacc acataaaact gccgcgggta caccgcgcat gccgctgcgc agtgcatgca 123 00 

atgtgcccag tttgcccgcg ccgtgaaaga tggaattctg tccgtgcact gccgcatata 12360 

tgccgcggcg cgcctgcatg ttgtgccctg ctcgtacgcg caaatgtcag gagctgccgc 12420 

gccaaaagac ggcgcgttca ctgccgcgca tatgccgcgg ctgcatgctg tgccctgcct 12480 

atacacggaa agatcaggag ctgccgcgcc agaacactgc gcggcgcgtg cgccgcgcgg 12540 

cagggccgcg cccgtccgcc gcatcgcgcg accgggacct ctgccgctcc agcaatgtat 126 00 

cgagcgccga gtcgtcgact agagtgcgcg ccggcaggcc gcctggcgtc ggcacgccct 12660 

gcatccccat ggcccggcgc atctcatcgt tctccctccg gagccggctc tccttctcat 12720 

caagcctgct gctcatcctg tcgagaaaca tcacatccga gttgtataga tccctgcgct 12780 

gctccatcat gcacagtatg tggcgcaatc gggactggtc gcatattccg gatgccatga 12840 

gctccatgac cccgtctttt gtgtgcccat tctgattccc cccgggccgc cttgcggccc 12900 

cgcgcatccc ggacctcatc gccgcttcct caggtattcc cggactatcc tgttggcaag 12960 

ccgggtctcg tctgtcccct cgcgctcggc cagcctggag agctttcttg cgccgttctt 13020 

gcccagctct attggtatct ttttcttgat gcccaccttg cgcatcctct ttagtattat 13080 

cttgtggccg gagcgggggc tctgggcaag caacctcagg tagatccgcc tcgacggcct 13140 

gtcgagcttt gctatctttg ataccacctt gagcgcctgg gatatggtcg gcaccgcctg 13200 

gtagagcctt gtcgcctcgt cccgggatat ggtgcccggt accatcgcct tgatcttgtc 13260 

cggtacgccc gcaaagccgt ggtatttctt gaacgtgggc atcgacatgc cgagcttttt 13 320 

tgcggcctcg gattttgtcg tctgctcggc caggaacttg catgcgtctg caagctcccg 13380 

cgggctcatc tggagacggt gcaggttctc tacaaccgat gcggcctttg catcatccag 1344 0 

gccgtactct gtatccttgg ttatcaccag aaacttggac ttttttgcgc ccaggtactt 13500 

gagggccgca agccggtggt gccccgatat gaggaggtac agccccctgc cgcccctctg 13560 

tatgacgggc gggttctgca gcccctctga tctgatcgac tttgcgatat cccgcacccg 13620 

ggacctgtcc agcctccttg cctgcgcctc cttccacaca tgcacatttt tgaggggcac 13680 

ctcgcggagg gtctgcttta tcttgggctt gtagcgccga accaacgtac ttttcaagat 1374 0 

gcggatcctt gttaactgtg tttggtaagt ttatcacaac aattaggtta gatagagctg 13800 

ttcccacgcg gcaatcccct gtatacgcac gcaaatccgc gcatactccc ccgggaggcg 13860 

ttctggggcc ccggggctca cgagcccgga acctggggtg ccccgcgggg gcgtcgatag 1392 0 

aataaatacg cgcagggggc cccgtggcgc gatcgcccgt gctgataata aactgcaaaa 13 980 

actacaagga ggcggccggc ggcagaattg acagcctagc ggcggcagcc gccggggcgg 1404 0 
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ccgcaaaata cggcgtcagg atagctcttg ccccgccgca gcacctgctg ggcgcagtaa 14100 

agggggaaga tcttacagtt ctggcgcagc atatagacga caagggggtt ggaagcacca 14160 

caggatatgt cgtgccggag ctgctgggag aatccggcgt ctctggcgcg ctcatcaacc 14220 

acagcgagca ccgcgtatca gctgaccagg tggcaagcct tgtgcccagg ctcaggggtc 14280 

tggatatgat ctccgtggtc tgtgtaaagg attccgccga ggcggcaaat ctctcccggc 14340 

accggcccga ctacatagct atcgagcctc ccgagctgat aggctcgggc aggtccgtct 14400 

catcggagag gcccgagctg ataggggagg cagcagaggc catcaggggg gcggatggaa 14460 

caaagctgct ctgcggggcg ggcataacat caggcgctga tgtgcgcaag gccctcgagc 14520 

tcggctccaa ggggatcctc gtggcaagcg gggtggtaaa atcatcagac cccgctgcgg 14580 

ccatagccga gctggcacag gccatgtcct gagtactagg cccccgcgtt attgaggcgc 14640 

gtcagcaggt caaacgacga cctgcgcagc tcatccggcg acttggcgcc cgctatcacc 14700 

atctttcccg acgcaaagac tagaaagctg cagctgtcca gccccttgag tatcatcccg 14760 

ggaaacgacc cgggatcata tacagcgcca ggcatgcgcg acgatatcct gtctatggga 1482 0 

acattcctac cggcatccac cgtggctaca atattgcgca cgacgggcct tatcttgcag 14880 

tcgccggcag ccccgttgcg caccaggtgg agccgcgcct cgtgcagctg cccaaacgag 14940 

gccctcacgg atctggcgcc gacggatatc atcttgccag aaatgaatac agtcaccctc 15000 

ccctgcatgc cgggcgtctt tatgtagccg cacctgccgc cgtatacggc ctcatcatac 15060 

atgcagcacg gcatggcggc catctttttt gcgctcaccc tttgtacaag gtctgatgtg 15120 

ctgacgacat tgacgacccg gggccgcgtc cggggatcca gcattggtca ggatccgccc 15180 

gtgcgctatt ttaaccgggg cccgggcggc cgcctcgcca tcttgtcata cttgcgcttc 15240 

atcaaaatta cagtgaccat cagggttatg ccggccacgt tggtccctat tatgtagacg 15300 

tcccatatgt gcaccccgta tgttatccag agcacagagc cagcccctat gaacatggtt 15360 

agataccacg agacgtccct gaggctcttt gtcttgtacg ccttgattat ctggtgcacc 15420 

catccggaga gtatcagtac gccgccggcc acggccacga catccagcag tgctatatcc 15480 

acggtggtca tttgaaaaag aactgctcca ttccagtctg ctttggcttg cccagcatct 15540 

cgtcaaagtc aaggcccatg gacgaggtga gctggtccag agtagactcc atgaactcta 15600 

gatactttga cgtgtccacc tctcctgcct gggccatctc gacaggcttg acgcctgtct 15660 

tgttcatcac ctttacgtac gatattatgt cgcctttttt gacctccctt gcgttctcga 15720 

gcagtctggc cgcccgtatg tgctgcggga cggtctttac gtattcagag ggcgccttgc 15780 

ttatcatcac attgaacgcc agatccgcca gcgggacccc cctctcctcc agcctcttcc 15840 

cggatgccgc tatggccttt gagatcttta gctttgccga ttcaaactcg tcctcggtct 15900 

gtacagccga cagtatgtcg agcagcgaat agaacagctc ctttatgaac gggggcgtgt 15960 

gcgacttttt ccccgtcagg cccttgacgt cgaccttgcc ggactttgtc accccgaaat 16020 

agtttttctt cctgttagat agcacgacat acctgtactc tttgtccacc tcgagctcga 16080 

cgccgtgctc ctttttggcg tgctcgacta tatcatggat ctgccgctcc tctggattct 1614 0 

ttatgaacag cgaatcggtg tccccgtaca gcacctttac gcccatctgc tcgcagtgcg 16200 

atatggtctg catgatgata tagcgcccga ccgccgtggt ggcctcggcg gcaggcagaa 16260 

agtacagcgg gaatatctcg gcgcccatca ccccgtagct tgcgtttagc accaccttga 16320 

gggcctggct gatcacagta tactgctgcc gctgctcctc cgttatagac tggctctttg 16380 

agaggctctt gtaatagttg acgcgcaggt cgcggagcga tcctattatc atcgatgtaa 16440 

gcccgttgtt tttcgtgcat acccagtggt tggtatcggg gatggtgttc tttctgcatt 16500 

cgggatgaac gcacctgacg gtctcgtacg agaggtttcg cacctttatt atgctaggat 16560 

acaggcttgc aaaatccata actgtaacat caaagtgtat gccctcttca ggctcgacta 16620 

cgagaccacc gcggaacttt ttgtccttga ttacggcgtc gttgcttacc tgttgagacc 16680 

tcttttccag ctcgtccctg cggggtatca gcgcgttgcg ctgcctgtgc tcatagtaca 16740 

gcaggctcct tatccactgc gagacgccca tgcgggacat atcatcgatg ggcatccggg 16800 

caatcctgct ggtcaccacc aggaggtcca tcagtatctc gttcccaaag gtgctaagct 16860 

ccagcgtcag gcgcgcgtca tgatagcaat agtttgcagt ctggtataga gtgagatccc 16920 

cgagagacac gccataatcg accttgccct cgccgagcat cgccttggac acgctgttca 16980 

gggagtaatc tgtatacttt gccgcaaatg catacagctg gaacgacctg ttcgagaagg 17040 

tcctgtacag gtccagatgg acgccgtgcc ggagcgtggc cgaatcccgc atcatgtaca 17100 

ggggtatgtc ggaatccgcc acgccgaggc gccgggcccg attgtacatg tacggcatgt 17160 

caaagtcgtc cccgttgtat gtaagcacaa acgggtacga gcctattatt gctagcgcgt 17220 

cgcggatcat gtccgcctcc ttgtcctcgt cgtagaacac cacctcgacc ccgggggtca 17280 

catcgtttgc gccctcgtcc gcgccgctct tcaggacaag gacctttctg aggccgtcgg 17340 

tggcggcaaa ccccactgct gtgaccctcc tgtccgagat cttggcatcg gggatcctgc 17400 

cctcctctga atccacctcg atgtcaaagc tgaggcgcct tatccggggt atgggctggt 17460 

tgagcaggtc cgcccacccc gctatgaact cgcggaactc tttcctgtcg gccatgccct 17520 
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cgtctatgag 
gcatgtcatg 
ccgcgtcata 
ttctgatgct 
tggagacggg 
gcacgtcctc 
agggcttgtg 
acttgaggac 
gaatctcttc 
tgacgcccag 
taccagcttg 
ggtcaccgcc 
gcctccccct 
gttgcacagg 
gcacctttcg 
tgtggccagc 
ggatcggagc 
aaaaccgctc 
cgaaaccata 
caaatacgta 
ggccatacgg 
cctgcaggtg 
aagggtgctg 
taccctggtc 
gatctgcgcc 
acagttcggc 
cgcaatagcg 
aagcgagagg 
aaggaccgaa 
aaaggtggag 
cgaaagatat 
ggctctgctc 
gtttactgcg 
gtttctaaag 
cgaggaggac 
catggagcat 
gctggtcttt 
cgggataaac 
acaggtagag 
agtgggcgag 
cccaagctcg 
gctggtggta 
caagattact 
ccctgcgcca 
ccaggcgctt 
cctgcaggcg 
ccaagctgcc 
caaacttttc 
ccagtatgtg 
taacttctcg 
tagactttta 
gccaacctgg 
gaggtgatga 
aagtacgcgg 
gaggagctgg 
cccgcaaagc 
aagttctggc 
gagtcactag 



cttgtcccag 
cggaatcacc 
cagatagttc 
cttctccgag 
cacctccttg 
cctgccccca 
ccccgtgttg 
cacggccctt 
atcctgtgca 
tatgtgatct 
gaggcgttct 
atctttgtgc 
gcgcctattc 
caccggtcca 
cacccgccct 
tgcctgtgat 
tttgccagct 
aacggccatg 
aaacaacagg 
gagcccggcg 
gaaaactgca 
atctcccact 
gtaaaccagc 
acaggcgagg 
acccccgaga 
ctggttgtgt 
cgtgcagtgg 
gagaaagccg 
gacgacccgg 
ctgcccccgg 
gcggccctca 
cgccttcgca 
atacgcatca 
ttctgcgaga 
agaaacttta 
ccaaagatac 
acaagctaca 
tcggggatcc 
actgtcgcca 
gagggcctcg 
ataaggtatg 
ctgatggcaa 
gccgccaggg 
aaggcagccc 
tatcacgtgg 
gaacgcggca 
gtccctcttt 
gcccgccttg 
cgagcacttt 
gtatgcaggg 
attgggatcc 
ccctgcccga 
gggccgagaa 
agctcaagca 
agaagacggg 
gctttgggga 
atgaaaagga 
tgtagatgct 



agaaggctct 
tccccgcctg 
tcgtaatact 
tgggtcccgc 
tcggctatca 
agaaagccga 
tccgtccagt 
gcctggccat 
gtcagcgcac 
tctccttgtc 
ccggatatgt 
actctaaaca 
ccagtatcgc 
ggccctcgcc 
cgaagcagtt 
ccctgactat 
ccgcctgcag 
gtgccaccgc 
ccgcgtcagg 
ccgtcgagag 
tagtggtgct 
atttggacga 
accgccagtt 
acaccgtccc 
taacaagaaa 
tcgacgaggc 
gggagaactc 
acgagataat 
atgtaaagcc 
agatgaagga 
agaggtgcgg 
tggtcgttct 
catacgcgct 
ggaccgtcaa 
caggggccat 
caaagttgga 
gggactctgt 
tcataggaaa 
agttccgcga 
acatatcgga 
tgcagagaag 
aggggactat 
gcatggggga 
caaaaaaggg 
tagccaaact 
tcctcaaacg 
gcgctgcccc 
aggcgctctt 
atcttgtccg 
ccctgccggc 
ggcggggcgg 
cgtggtaaag 
gcagatccag 
gcagttcaac 
cgtggtggtc 
cgacatctgg 
ctcggggttt 
ctcctcctgg 



tgagggccag 
ataccgaata 
ttatgtcgga 
ctatcgcaag 
ggtcgtgccg 
gctcggaggg 
ggatgatctt 
cataggttgc 
cggcacctcc 
catcatggtt 
atccaggcag 
cggagagaac 
acagtgcatt 
tgagcggatc 
ctttacgccg 
gacggccccc 
catgaaatat 
ccggcatatt 
gccgcgcgtg 
gcgcgactac 
gcctaccggc 
aggcaggggg 
cctgggcagg 
gaggcgcaaa 
cgacatagcg 
ccacagggcg 
tagaatgatc 
gggcactctt 
ctacgtgcag 
gatccaaaag 
ctatgatctc 
aagcggcaac 
caacatattc 
gaaaaagggc 
ggcgcgcgca 
agaggctgtg 
cgatttaata 
ggcgggagaa 
cgggggatac 
ggtaaacctt 
gggcaggacc 
agacgaggca 
caggatgaac 
gctcgagggc 
cggattttac 
gctttaccat 
cgtctatcga 
gcactgcgag 
ccatgcgcgc 
cggcagatct 
cgcatgtctt 
aggtacaacc 
gtgtccatct 
tcgaggataa 
aagagcatag 
ctgtgctgga 
gacggaagaa 
ctgcgcgtaa 



ttttacctcg 
gtacctgccc 
ttcccacgtg 
aggatcagag 
catgacctgc 
cggcagcctc 
ttgcgattcc 
agatacaagc 
tttgttcccc 
atgctggccg 
acaaaccgcc 
gtcgtataca 
atcgcgttgg 
ttgccctcca 

gggggcgtcc 

acctttctga 
tcgtcccagg 
atggtatatg 
gagaccgcac 
caggtgggcc 
ctcggcaaga 
gctctcttcc 
gcccttacca 
aaagcttggg 
cgcggaatgg 
gtgggcgact 
ggcatgactg 
ctctcaaaga 
gagaccgaaa 
ctcctgaaga 
ggctcgaaca 
aggcgggcgg 
gaggcccacg 
gccggtgttg 
aaggcggcgc 
cgcggggcca 
cactcaaagc 
aagggcctca 
gacgtgctcg 
gtggtattct 
ggcaggaagg 
tactactgga 
aagtcgcttg 
tatttctagg 
cggctcggat 
cttgcccctg 
gagctccttt 
cgcctcgccc 
gcacattcca 
tccgggcggg 
tgtattttac 
acgtcctggc 
cgtcgtcggg 
ccgagttcta 
acgaggggct 
aggtgggcga 
agcccataga 
tacgcgtccg 



tcggatattg 
actaccaggc 
tctatcacgt 
acggttatct 
tctatcccga 
gtataacagt 
gactcgtaga 
agtgacgggg 
cgggcatcct 
atgcatcttt 
tgattcctat 
tggtggcgtt 
cctctgcatg 
tgcgctctat 
cgttgtatcc 
ccatgcagtt 
acgggcgctc 
ccccggtgta 
acataacggg 
ttgccgagca 
cggccgtggc 
ttgcgccgac 
tatccgatat 
gcggcagcgt 
tcccgctcga 
atgcctattc 
cgacccttcc 
gcatagcaca 
ctgaatggat 
tggccctcga 
ggtcgctctc 
caaagccttt 
gggtcacgcc 
cagagctgtt 
aggcagccgg 
aagggaaggc 
tgcaggctgc 
agcagaaaaa 
tatctacaag 
atgacaatgt 
acgcgggcaa 
taggccggcg 
cagcgggggg 
cgggcttatc 
acctcgccta 
ccaaagtagc 
gccagctttc 
tgctttttta 
taccctgcta 
cccgccaagc 
gataaagacg 
gtgcaagagc 
cggtctggac 
ccgctcgata 
cctggacttt 
gcgcgagatc 
ggtaagtgac 
gttcctgctc 
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gcgtcggtga tagccgtatc agcgggcctt gccctctcct ggtggcacgg ccacggaata 21060 

gacgcgctca cagcggcact caccatggcc ggagtggccg ctcttcatgc aagcgtggac 21120 

atgctcaacg actactggga ctacaagcgc ggcatagata cgagaaccaa gaggaccccg 21180 

atgagcgggg ggacaggggt gctgccagag ggcctgctga gcccccgcca ggtgtaccgc 21240 

gccggcatca tatcactggt gctcgggact gccgccggcg catactttgt gatcacaacg 21300 

gggcccgtca tagctgcgat actcggcttt gcggtggtct cgatttactt ttactcgaca 21360 

aggattgtgg actcgggcct ctccgaggtg ctcgtcgggg tcaagggggc gatgatcgtc 21420 

cttggcgcct actacataca ggcgcccgag atcacgccgg ccgccctcct cgtcggcgcg 21480 

gcagtggggg cgctgtcatc tgcggtcctc tttgtggcgt cgtttccgga ccacgacgca 21540 

gacaaggagc gcggcagaaa aacgctggtg ataatactgg gcaaaaagag ggcctcgcgc 21600 

atactctggg tctttccagc tgtggcgtat tcatccgtga tagcgggggt gattatccag 21660 

gtgctgccag tgtactccct cgccatgctg cttgccgccc cccttgcggc aatatcggca 21720 

aggggccttg ccaaagagta tgacggggac aggatcatac gggtcatgcg cggcacgctg 21780 

cggttcagca ggactgcagg cgcgctgctg gtgctgggaa tactgcttgg ttgagtggaa 21840 

ctagactcga gactgtgtaa gcataagatg ggcatgcgat caagtaccag aaccgataga 21900 

attattctcc ataaaatcat ggaattccca caccccctga taaagatctg aagatctctg 21960 

cccctctgac ggaccagtcc agacgaaagc gccatctcat caaaagggtc ggtatttgaa 22020 

ttagtcacgt atgttgggga cgaacgtagt gaagtaccag cacaatctgt ccatcttcac 22080 

ccagatcatg cgattctaac tgcaccatga gagtcaatcg ggtgtaaata aattgggatc 22140 

acttattcta ctatcacgtt atcatctgtc atgtcaacga agatggtttg tataatctgc 22200 

gggttcctaa tatctccaat gtcatcagaa gttacttctt tcgtttctcc atcccgcaga 22260 

gtaacattgt gctgtcccat aggctgatag agcttctatg tatatgaaaa cttaccaact 22320 

ttacagggaa ttggaagata aatcaagggt tgttgaataa gtcgactagg aggcagcata 22380 

gtataatctc ctttgtacat tatgcgtaca tagccaccaa ccggttgtaa agcgacaccc 22440 

tgatcaggat ttcccgcatg atgctccctg ccttcctggc ctgcacggac tcgccgaata 22500 

tctttttgga cccctcaaac accatctcca cggatgtcca gcagtcgtat ccaatatcct 22560 

tgttccactc gtcgtaccgg cccacctagg ccttgaccac gagctgcccg gcacggcagc 22620 

ccgtgcccaa cctcgagttt atcctgatcg tggagatgaa catgtgtaac tgcagtcacg 22680 

gaaaccttca gtttctggac agctatcttg ggtctagctt aaacatacgc aacgtattgg 22740 

cagaattact agatttctat ggtcatgctt tcttctgtca ctctatcgaa cacggcacgt 22800 

ataatgtgtg tgcgcttttc atctacaaga cttgagcagg ttattacttt tgattccccg 22860 

caccgcatag taacgtcttg tatgtccatt atgaattaag acatctacgc gtataaaaac 22920 

atagtattgt taccggggcg gggctacccc agggtagcat catccccctt gtacatcatg 22980 

tgagcatggc cacaaaccgg ctgcaggcca acacctcgat caagatctac cgaatgatgt 23040 

cccctccctt cctggcctgc acggcatcgc cgaatatctt tttgaacccc gcaaacacca 23100 

tctccacgga tgtcctgcgg ccgtatccaa tatccttgtt ccactcgtcg taccggccca 23160 

cctaggcctt gaccacgagc tgcccggcac ggcagcccgt gcccaacctc gagtttatcc 23220 

tgatcgtgga gatgaacatg tgtaactgca gtcacggaaa ccttcagttt ctggacagct 23280 

atcttgggtc tagcttaaac atacgcaacg tattggcaga attactagat ttctatggtc 23340 

atgctttctt ctgtcactct atcgaacacg gcacgtataa tgtgtgtgcg cttttcatct 23400 

acaagacttg agcaggttat tacttttgat tccccgcacc gcatagtaac gtcttgtatg 23460 

tccattatga attaagacat ctacgcgtat aaaaacatag tattgttacc ggggcggggc 23520 

taccccaggg tagcatcatc ccccttgtac atcatgtgag catggccaca aaccggctgc 23580 

aggccaacac ctcgatcaag atctaccgaa tgatgtcccc tcccttcctg gcctgcacgg 23640 

catcgccgaa tatctttttg aaccccgcaa acaccatctc cacggatgtc ctgcggccgt 23700 

atccaatatc cttgtcccac tcgtcgtact ggcccacttg ggccttgacc acgaactgcc 2 3760 

cggcacggca gcccgtgccc aacctcgagt ttatcctgat cgtggagatg aacatgtgga 23820 

actgcaggca cggaaacctt cagtttctgg acagctatct tgggtctagc ttaaacatac 23880 

gcaacgtatt ggcagaatta ctagatttct atggtcatgc tttcttctgt cactctatcg 23 940 

aacataaaac gtataatatg tgaacgcttt tcatctacaa gacttgtgga ggttattact 24000 

tttgattccc cgcaccgtag agtaacgtct tgcatgtcca taagggatta agacatctac 24060 

gcgtataaaa acatagtatt gttaccgggg cggggttacc ccagggtagc atcatccccc 24120 

ttgtacatca tgtgagcatg gccacaaacc ggctgcaggc caacacctcg atcaagatct 24180 

cccgaatgat gtcccctccc ttcctggcct gcacggcctc gccgaatatc tttttgaacc 24240 

ccgcaaacac cgtccggttc cccgtaccag ttcggcatgc agacgtcttt aatcaggccg 24300 

acaactctgc cgtagttgta ccctccaggt ttctccatta caaggccgga atcttttgat 24360 

ccaaatattt ttcggcgcac gatggcgcct aattaggcag ggttagtcag caggctcaat 24420 

caatatgcct attttactct tttacccacg gcttctacca cttttttaca cccattacaa 24480 
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atgaacctct 
cacaacggac 
aagacttttt 
gtataatgtt 
tcgtgccact 
tctagattct 
catattactt 
aaacacctgt 
gtccaaaacc 
ttatccgcgg 
atgtgatgga 
aacctgataa 
caatctcaca 
gatcagcggg 
ccatgacaac 
ccttggcgac 
caagtcgggg 
ttccgtgggc 
attcatatcc 
ggagtatgta 
gggataccac 
catgaacgtc 
ggaccgcccc 
aatgcgggag 
cgcagagggc 
cggcggggag 
ggcctacatg 
agccgcgctg 
gcctggcctg 
caggtctaca 
cgagaaccta 
ggccagcctt 
tctggtctag 
tcctgtccac 
agactatccc 
tggcaaactt 
cgtggtgcgc 
atgctgcaaa 
tgaacttgcg 
tctcctcggt 
ctgcaatcct 
gctccggctt 
tgtcaaaggc 
ggttctgctt 
agtggtaccc 
gccgcctccg 
tgcagctctt 
ccagctcgga 
ccagcctgta 
catccccccg 
ggacagatct 
ctgccccggc 
ttatacaccg 
ggcgcgtatc 
ctctacctga 
tggatggcca 
gccggggagg 
tacgtctatg 



gtggcccggt 
acttgaaagt 
ctatgtatat 
gagtgcccgc 
ccacacagga 
tcatgcatat 
aactattctg 
acttcacaac 
aagaccgata 
cgggttgggt 
gatcaaggga 
ttgaggcccg 
ctttgtaccc 
catgccacgg 
tacaaggtgg 
gcggatgacg 
ataaacgtca 
agggccgtta 
acaaaggcgg 
aaaaaggaat 
tgcatgaagc 
gactgcatag 
gtgccggaga 
gctggccgca 
gacccgcaga 
aaccacggct 
gtaaagaacc 
gacattggcg 
ctgccggagt 
ccgggagtgc 
aagatcatgg 
acctcatggt 
cacctttttt 
gttggcagga 
cgttgtgccc 
tactatccgg 
ctttgtggat 
tatcggatag 
gtagcatgtc 
ggttggcagc 
ccctatggct 
cctgttgacc 
gggggattcc 
cgagctctgt 
aaggtctaca 
ggtgtcgcct 
gaacacccct 
gggatccgac 
cctgccgcgc 
gataacctgg 
gcgcagcagg 
gccgacttaa 
cccgggatcg 
acatgtggat 
tactgcccat 
cctactggtg 
ccctggggcg 
cggcaatatg 



gtccatttta 
gtacctgtat 
tgtgcaacat 
aacatccttc 
agtatggcat 
ccctcaaagc 
ttgtgcactg 
acagtgtaaa 
tggatagaga 
gatatcaagt 
cggcaatata 
cggctgtgcc 
ttcatacaca 
ccgagggtac 
tagacgggct 
ccaccgacag 
tagataccgc 
cagagctctc 
gatacgtgac 
acgtcggtgg 
ccgcgtatct 
atcttgtcta 
tcctcgaggg 
taaggtatta 
gcatgcagct 
ttaggttcat 
aggggacggg 
tgttcacaag 
ttggcgggct 
ttgcccccct 
gcgtgccccc 
cgcccggcca 
ggcagcttcg 
tacaggtcta 
cttcccgcaa 
gatacgaggt 
atctcccaga 
tgctcgtggc 
gggcagtgct 
tcgtcaaacg 
tcagggtccc 
ccgctcaggg 
gactttgata 
gatagcgcgt 
agtctgagcg 
atcactatga 
gccagctcct 
ccgtactttc 
tcgccattct 
aaccggttgt 
cgggccgcgc 
aatcgttgta 
gccgccttgc 
aaaggacgaa 
ctatgggtat 
gtcagtagct 
gctgatcggg 
gctcggcatg 



atggatgaat 
ggaaatgtcg 
ttggcatgat 
ttgcacagag 
ttaaaatcag 
tttttcatgt 
ctgacaaggc 
acctcggggg 
aagagaatac 
gccagctcag 
catggcactg 
tacagcacgg 
taaatcccgc 
acagaggata 
gcacctctcc 
ggccgtcaca 
gataaactac 
agaggagggg 
caacgattca 
cggcgtcata 
agaggaccag 
cgtgcacaac 
gataggcgag 
cgggctcgcc 
cgaagcagtg 
acagctgcca 
cggcggcaag 
cgtcccgttc 
ctcgcccgcc 
gccggggcac 
cattcctcct 
gaaatagtca 
agtccgcaga 
ttcccgtaaa 
acggatccag 
cttctgggaa 
cgttcccagg 
cccctatccg 
tttcggggtc 
gcgtctcggg 
tcctcccggg 
cctcgttgcc 
gcaccagcac 
ttttcttgta 
cgagccggtg 
agaggctgcc 
cgacgaactc 
tgtgcccgta 
tttttgcaag 
tcatgttgcc 
gatcctacgg 
ggatgcggcg 
agcacacgca 
ttcctcggcc 
atctttctgg 
ctcagccccc 
gatcacgtat 
gcccatggga 



ccgtgctgtc 

ggggggcgtc 

acaccattcc 
cacataatcc 
aataaggcat 
ccatgttctc 
cctgaagttt 
atcacaagga 
acgccgcggc 
ggggcacgac 
catgccgcag 
gatattggcg 
ttggatgtgc 
gccgagatgt 
aacgtgggga 
gacgcggtca 
cgcctccaga 
ctggtatcca 
gaggtctccc 
cagtccgggg 
ctaaagagaa 
ccggtggagg 
gcctttgcca 
acgtgggagt 
gtaaaaaagg 
ttcaaccagt 
tcatccatac 
atgcagggca 
ctgcggtccc 
aagtccagcc 
gacaagttcg 
gcgtgttccc 
atctttcaca 
gcccctcttg 
cacatagtcg 
caccgcaaag 
gttcttgccc 
cttgcgcgtc 
atacccgtgg 
ggacgagccg 
ggagaactgc 
ctggacgcgt 
aaactcgtac 
ccagactata 
cgggaccatc 
gtcgtcggta 
atcaggcgtc 
atacgggggg 
cctgggcagc 
tgccccacat 
gtcatacaca 
ccgcagatgc 
gtataaacgg 
cgggcaacaa 
agtactatcc 
cgatagtgcc 
tgtttggcat 
taatcctgct 



ctcatgtgaa 
cttttcaccc 
ccctttcttg 
gggattcgat 
gatggagcct 
ccccaaatgc 
ccgagtactc 
aaggagctcg 
ttttggggat 
atgccgcagg 
gtgtggccat 
cagaacgggg 
ggctgcgcat 
ccggcgcaca 
tgggcaccta 
agaggtcaat 
gggccgagcg 
gggaccagat 
tcgacttttg 
acatatcctc 
gccttgcaaa 
ggcagatcaa 
tgtacgagaa 
gcttccgggt 
ccaaggatgc 
actttgacca 
tggaggcggc 
agctgctcga 
tgcagttcat 
tgcatacaga 

gggagcttgt 

tcgggcatta 
ttgcgccggg 
aggcacgccg 
ccctctcttg 
tgctcgttgc 
cgggggttgc 
gcatgccttt 
gcccgcgata 
tgtatcactg 
agccggtcgc 
atcgggtcta 
gcctgcgtaa 
tcctcttgaa 
agcttccggc 
agcaggtcca 
ccctcctggc 
gaagtgacgg 
accgcccggg 
caggctggac 
ttctatgcgg 
attgcccgcc 
gggcccgggc 
gatgaggctg 
gttctttccc 
cacgcattat 
caccacaaag 
ggcagggcgc 



24540 
24600 
24660 
24720 
24780 
24840 
24900 
24960 
25020 
25080 
25140 
25200 
25260 
25320 
25380 
25440 
25500 
25560 
25620 
25680 
25740 
25800 
25860 
25920 
25980 
26040 
26100 
26160 
26220 
26280 
26340 
26400 
26460 
26520 
26580 
26640 
26700 
26760 
26820 
26880 
26940 
27000 
27060 
27120 
27180 
27240 
27300 
27360 
27420 
27480 
27540 
27600 
27660 
27720 
27780 
27840 
27900 
27960 



WO 00/18909 



PCT/US99/22752 



-10- 

ctccggggac ctaggcaggc gccacggacg ggcatcccat aggctctggg gcatccgcgg 28020 
gtccccgcgg tccaattaaa tacagcaagg aacgggtagt ttcgttgaag ctgcaaggca 28080 
agactgccgt gatcaccggc agtggtaccg ggatcgggct ggcggtggca aggaaatttg 28140 
ccgagaacgg ggccagcgtg gtaatactcg gaaggagaaa ggagcccctc gatgaggcag 28200 
cagcagagct caaaaagata gcggaatctg caggctgcgg ggcctcgatc aggatattcg 28260 

ccggggtgga cgtggccgac gaatccgcga taacgaaaat gttcgacgag ctgtccagct 28320 

caggtgtaac cgtggacata ctggtgaaca atgccggcgt gtcggggccc gtcacgtgct 28380 

ttgccaacaa tgatctagaa gagttccgcg gggcagtcga catacacctg accggctcct 28440 

tctggacatc gagggaggcc ctcaaggtca tgaaaaaggg ctccaagatt gtcaccatga 28500 

ctacgttttt tgcagaagag aggccactcg agcagaggcc gtacaggttc cgcgacccgt 28560 

atacaaccgc acagggcgca aagaacaggc tcgccgaggc gatgtcgtgg gatcttttag 28620 

accgcgggat aacatcgata gcgaccaacc ccggccccgt ccattctgac aggatataca 28680 

agacggtata cccgagggcg gcactcgagt ttgtcagggt ttcagggttt gaggacctgc 28740 

agccagaaga agtcgaggtg gcaggcggca ggctaatcca cctgctcggc gcggacgacg 28800 

atgcaagaaa aaaaggcata gcagaggccg cagagcactt tgccaagcta aagcccgtgg 28860 

atcccgcaaa gctagaggcc acccttgatg ccctgctcgc aaagatcaag gggatagccg 28920 

aaaagataca ggccaacact gcaaggatga taccagacgg ggagtttctc tcccaggacc 28980 

aggtggccga gacggtactc gccctctgcg atgacaagat ggccaagacg gtaaacggcc 29040 

gcgtaatccc cgccgacagg gtattctacc cggtaagggc gcatgtggcc aatgccgctc 29100 

cgcgcgtgcc cccgcacgac tattccgggg gatgcgtcct attcatgata gatgcagcag 29160 

acgacaggga tgtagaaagg gcgaccgccc tggcatccca tgtggaaagc cacgggggca 29220 

cggcagtctg catagtctca gaagactcgc cccgcgcggc aaaggagatg atagcgtcaa 29280 

agttccactc gcatgcgagc cacatagaca aggtagacga gataaacagg tggctgagcg 29340 

ctgcatcaac aaagataggc cccatatctg cagtggtcca cctgtccggc aggatgccaa 29400 

aatccggcag cctaatggat ctctccagaa aagaatggga cgcgctggtt gacaggttca 29460 

tagggacgcc ggctgccgtc ctgcacaggt cgcttgagca ctttgcaccc ggcgggcgca 29520 

aggacccccg tttgttcaag ggcaagagcg gcgtcatcgt gataataggc cccgacctgc 2 9580 

ccgcggggaa aaaggcctcc ggcgccgaga gggcaagggc ggagatcttc cggggtgcgc 2 964 0 

tcaggccgct gacgactaca gtcaaccagg agctcagcga tgtgctaaag tcaaacgtgc 29700 

gcctgtttac catccttccc ggcagggcgg acgggggcga gaccgatgat tcccgcatat 2 9760 

ctgctgcaat cgactacttt ctgacccccg aggctgtctc gtccggcgag gtcatattct 2 982 0 

gcgtagacga gaacaggggc tagcccgcca cagcgtccca gaagagtttt gctgccacta 29880 

caagtatgac cactgccgcc agtatgcgca ggcccctctc cctcaggttc agcgagagct 29940 

tggcgcccag caggcccccc gcaaacgcgc cagacgagag cagcagtgca tggtaaaagt 3 0000 

cagtgtgccc cagcagcgta tgggtgacca tgccggtaaa tgccacaaac atcaggacca 30060 

gctgcgcggt gggcgcggcc ctccacatgc tcatgcccat tatcgccacc atgagcggga 30120 

caaatacaag gcccccgcct atcccaaaga agctcgatat tatccccgcg aaaaagctgg 30180 

ccgctatgga tagcagcagg acggtcaggt gggagcgccg ctgcccctcc ccaatcctgc 30240 

tgctgagcag caggtatgca gcagatccca ccaggacgat cccaaagaac aatctgaaga 30300 

tatcaggcgt tgctgcggcg gagaatagcg cccctagaac ggtaccgggc agcgccagca 30360 

ggcccagcgt cagccccgtc cggtagtcta tcctcttctg cctggcatat gacgcggtgg 30420 

ccgccgacgc gctgctaaat gccgcaaaga ggctgctgct ggctgcagcc gtcggtgaaa 30480 

agcccataaa ggtcagcacg ggaaccacca cgaacccgcc gccaagcccc accatcgagc 30540 

cgattacccc ggctgccagg ccaagcagcg gcagccatac ctcctccatg acaacctgcg 30600 

gcactgctgt tatgtataat accgtttttg ggggcgagga aacggatatc atgcctagaa 30660 

gccgtctgtg ctatccctgc cgacccccaa gccccaccca cttcacactc atctagtaag 3072 0 

caaatcttgc accaagccgg atattttcca tggttttggc aataaattca ataaacagta 30780 

cgttcggtca tactgcatgg caaaagagac catagggaac cgcccagtcg atatcatgaa 30840 

ggagggcaac gaggtcaaga tagtattcca ccccattcta aaaggggcaa aacaccccga 30900 

cgcggccgta ttctcgataa aactgtcgaa aaaagattta gagatcatca ggaatgcttt 30960 

ctaacatact gtaaaatctc aaaaacatac tgcaacgtgc gatttatata cgggaaatgc 31020 

gatcgcgata tagtgacaaa atataaccga tccgctagtg aaagcatggg atcggagacc 3108 0 

tatgggtttg atgccatacg aggcatgcaa gctaaccacg aatattacgt taccatatgt 3114 0 

cctctaaaga ttattcccaa gctcttcata ttcaacgagt atgagctccc ggcaaagctc 31200 

agagctcaaa ggacactccg aaaatctaga attccaaccc tcaaggacta catactaagc 31260 

aatcctgacg agtacatatt ctcgtccctc gctgcatcgg tggatgggcg catgaagttt 31320 

atcccagccc cgcatctggg gccagatggt aaaatgggca gactccacat agacatgtcc 31380 

tccaagctaa taatcaacga cggacaacac cgccgcaagg caatagaggc agccttgctc 31440 
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gagaagccgg atctgggcaa tgagtcaatt tcagttgtat tttttgagga tcgagggctt 31500 

aaacgctgtc agcaaatgtt ttccgatctg aacaaaaatg ctgtcaaacc atccaagtca 31560 

ctcaatatac tgtatgacaa caggaatcca ttttctcgtt ttatagtgga catggtagat 31620 

gagatagatg ttttccggga cagggtagag ctggaaaaga ccaccatagg gaaaaatgcc 31680 

aaggaagcat ttactcttgg gggattgtct gatgcaacaa tgagactgtt cggcaaaaaa 31740 

tccttgtcgc gacccagcaa ggaacaaaag ggactcataa aggagttctg gaaatgcgtc 31800 

tcagccaaca tgcaagaatg gggacggtta gtagacggcg aaatgtcggc agacgagctg 31860 

cgtgcaaact atgtcaatgg ccataccaac tgccttaatt cactagggga ggtaggccga 31920 

acagtaatca agcagcatcc agaatcgtgg aaaagaaagc tttcctctct gtctcggatt 31980 

gactggtcca gggaaaacga ggtgtgggag ggcaatctta tacagggtaa gaagatggtg 32040 

aggaccacca tcggaataat gctcggggct ggcgtcatac ttcgggaatg cagcatacga 32100 

gttcctgaag agattgagag gtatgagaaa tgacatctgt gtttgacaag cggacgactg 32160 

acagcatata cgacgaggta cgctcagtat acctcaacga tgcacgccct tggatccttg 32220 

ggtttagtgg cggaaaagac tcgacatgca tggtacagat tgtatggaaa gccctctcgg 32280 

aactacctgc agacaagctg gacaaaaaga tctacatagt gtcgtcggac accttggtag 32340 

agtccccaca gatagtggag cggctgacca agtcacttga cagcatagag aaggcggcaa 32400 

aagaggccca tattccaata tcgaccaacc tactgcggcc tccgattact gacacattct 32460 

gggtccggat actgggcatg gggtaccctg cccccacctc catgttcaga tggtgcactg 32520 

atatgctcaa aatagcaaac gctgacaggt tcatcaaaga gagagtctcc gagtatgggg 32580 

aggtcatagt tttgctgggc acccgcaaga gcgagagcgc cacccggcag caggtaatga 32640 

atctgctgga gatagagaat agcgttctaa gccaccataa aaaattcgca cagacctacg 32700 

tgtatacccc cctggtggac tttgaggcgg aggacgtatg gaactacctg ctccagaata 32760 

agaatccatg gggcgataac aaccgagacc tacttgccct gtatcaggat gccaatgcgg 32820 

cagagtgccc tcttgtggtg gacaccagca cgccatcctg cggaggtggc aggttcggct 32880 

gctggacgtg cacggtggta gacaagcaaa agtctctgga cagcatgatt gaaaacggtc 32940 

atgaatggat ggaaccgctg gcagaattgc gccacattct aaagcagaca caggatcc 32998 

<210> 2 
<211> 42432 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 
<221> CDS 

<222> (3) . . . (10421) 
<221> CDS 

<222> (10625) ... (11434) 
<221> CDS 

<222> (11478) . . . (13046) 
<221> CDS 

<222> (13046) . . . (14620) 
<221> CDS 

<222> (23558) . . . (24862) 
<221> CDS 

<222> (24913) . . . (25728) 
<221> CDS 

<222> (26504) . . . (26881) 
<221> CDS 

<222> (29655) . . . (30491) 



<221> CDS 
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<222> 


(34559) . 


. . (36067) 








<222> 


(37002) . 


. . (37403) 


<221> 


CDS 




<222> 


(37404) ; 


. . (38282) 


<221> 


CDS 




<222> 


(39454) . 


. . (40572) 



<400> 2 

ggatccccgc gccgccagga gagggcagcc ttggcggggt ggcaatatcc gacgacggga 60 

ggtacatgta cgcaatcggc agggatctgc tcacagtata ccggtataca atgaacccgc 120 

cccatgacat agcctcggcc gcgctcggtg cgcagtcatt ttctctgcct ggcggcatca 180 

gccccgcccc cggcgcgccg accggccttg acatctcgga tgacggccgc cacctgtacg 240 

tcccggacga aaacggcgtc gtgtacaggt ttgatctgga aagcccgtac aggctagacg 300 

gcggcacgtt tggctcttct gtttatgtgg gatccgacgt tgccgcgccc cgcggcgtat 360 

acgtggcgcc gggcggcagc ctcatgctgg tctcggatag tgcagacggc accatccaca 420 

ggtacgagct ggcaagcccg tacgagccgg cgggcgcggc aaacagggga tcattcgacg 480 

tgtcggatat ggacggctcg cctgtcgggg cggggtttgc gggcggcctg cacatgtatg 540 

tc 9 c 9S9 aa a cgacaccgga agggtctacc agtatccggc gggcacgcac cagatacagg 600 

aggcagccgc agggccgcgg ctgctctcgg ccgtcctgga caaagacgga accctgaggg 660 

cggcctttga cggcacggta gacgcgggat ccgtgcagcc cgggatgatc accatcaggg 720 

acggccatgg ctccaacacg ggaatacccc ttttgcttgc cgggggtgcc gcggactctg 780 

atgtcatgac atttgtggtc cccgagaaag acagggcaga ggctgccgca tacggggacc 840 

agtcgctgca tgttcccgcc gcggcgctgg cggggactgg cggcgggccg tttgtgcccg 900 

acttttccgg gggctcgctg ctggcgtccc tgtaccggca cgagcggccg ttccagggcg 960 

aggagatggc acggacggag agatccgaca ggtacgcgct tactgtaact gcaggcggga 1020 

gtcagatgca tgtgggcggc gccggcggaa acatcacctg gtacgatctt ggcacgcccc 108 0 

atgacataac gaccggcgtc cgcgcgggat ccgacatcct gccggcgtat ccatccgcgg 114 0 

gcagaaacgt ggtgccgtca ataacgggca ttgccttctc ggatgacggc atgcggttgt 1200 

ttgcagcaaa ccggggcgac cgcattccaa tgtaccagct ggacagcccg tacgacatag 126 0 

ggagcgccag cctcgaggga accctgttta cggggttcca gtcgggcatt gcattctcgg 132 0 

atgacggcac gcgcatgttt gccgccctgc tcaccgagaa tgccatacgg cagtacgacc 1380 

tggagggccc ctatgacata cgcggggcgg gcaatgcggg ccagtacgac ctggacatcc 1440 

cgctgcaccc aggactgctg ttcctgctga cctcgggggt gcacttttcg cccgacggga 1500 

cgaggatgtt cgtcggcgag gggatatcag atgcggagga tgccaacgcg aacagggatg 1560 

tcaacgtcaa cctgtggcac aggtttgatc tctccacgcc gtttgatgtg ctcacggcgg 1620 

agcgcgtgga cacgtacgag tacagcacgg ggccggcagg cgatctcgag gacctctccc 1680 

tgtcccctga cggccgcaga ttgtacaccc tgtcgagcga gagggtaagc tcaagcgagt 1740 

atacaatcac ccgggcccag tactggctgc cagaaccgta cgacgtgacg ccgccgtacc 1800 

atgtgccgtc attcaacgca agccaggggg gcaacctggc agacccctac gggatggcct 1860 

tctcgcccga cgggaccagg ctgctggtca cggggcacgg gcagacgaat gcaaagctgt 1920 

tccacctgaa tccgcccttt gatgtgggca cggccgtgtt ccacgaccac ggcaggttcc 1980 

gccccggggg gcccgcaagc gagatcgagg cgtcggggat atccctgtct gccgacggct 2040 

ccaggatgtt tctctccgac cgcggccgcg gggccatcag ccagtacacg ctggttgcgc 2100 

cctttgatgt ggagtttgcg tcggatgtgt ccgcggatgg gcagctcgac gttggcgccc 2160 

aggatgcgct tcccggcggg cttgccttct cgcccggggg gacgaggcta ttcatggtgg 2220 

gaggcatgga caggtcagtt cacatgtatt ccctgaatac gccgtttgac ctgggcgggg 2280 

cagagcatgc ggcgtcgttt ggcgtggggg acagggtctc ggatcccctc ggcatcgcct 2340 

ttgggaacgg ggggactaaa atgctaatag ccgatacgac aggctttgtg cacgggtacg 2400 

accttggcgc cccgtacgat atctcgggcc ccgcgtacag cggcatattt gacgccggcg 2460 

gcagcatccg ggacgtggcc gtcggcgggg ggtccatgtt catactcgag ggggagacgg 2520 

accgggtgta tgagcaccgc cccggcatat acccggtggt ctcagcactg gacgggccgg 2580 

cgctggtctc tgctgcagca gatgcaaggg tgggtgcggc cgaggtgctc tttgatcgcg 2640 

cggtggatgt tggcgggata gaccccgggg gggtccgcat agtggatgca gcaggccccc 27 00 

tgcccggcgt ggtgatctcg gatgccgtca taccaggcga ggatcccggc gtggccaggt 2760 
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tcagcctgtc ggacgcggag gtccttgccg tgtccgggta tgccgagccg agtctggtct 2820 

ttggaaggca tgcggtgccg ggcgcggcag gcggcacatt tccctcccag ataggcaacg 2880 

ccacggagct tgtgggatcg attccgaatc cgaccctgga ttttgggacg accctgacgg 2940 

gggcggcatt ctcggcggac gggacggtgg tatttctctc agacggcccc accggcaggg 3000 

tgtacccgta ttcactgaat atcccctttg acatatcgtc tgcggcgcct gggggctttg 3060 

taatcgtgcc cgtcggagtc tcggacattg cgttttctgc cgacgggcgg aacatgctag 3120 

tcgcggacga aaccggggga atacacaggt acctggcccg cagcccgtac gagataggca 3180 

cggatttcat caaatcatcc ctgggtgagt ttgtcgagac attctcggcg gcgccccgcg 3240 

tgcaggatct tgccggcatc gccttttcgc acgacggcat gatcatgctt gcggccggcg 3300 

gctcggggtc tgtgcaccgg tactcgctgc catccccgta tgcagtatcg ggggccaaat 3360 

acgaggagac ggcgatgatt ggcgggagcc cgtcggggct ggagttctcg tccgacggcc 3420 

tgaggatgtt tgttcccgat gcgggctcgg agacggcggc agtctacggc cttgccgccc 34 BO 

cctacgggat tggcgaggcg gagccgctgc cgccgctgtt cctgggggta ggggcagaag 3540 

aggccacgct ctcgcctgac ggcaggcaca tcctagttcc cggcaggccc ggcctgtccc 3600 

agtactcgct gttctcgacg aatcttgagc tgtgcgcgga gccccggggc attgacgggg 3660 

gatcgtgcga agatgggata tacgcctttg agagtccggg caggggcgag ggcgtatcgc 3720 

ttgccgcctc gataacggcg gcagacgggc caggaattgg cgagctgcac gggtttgcag 3780 

gcccgccgat gccggcgcct gtcatggagc aggtcacact ggattcgcgg gagggcacac 3840 

tcagggtcag gctggacagg acagtggacg tcgacacggt gcgcccctat aagatgtggg 3900 

tggaggattc agacggcagc cagacaaccc tggcaaattc aacactgttg aatgccgaaa 3960 

actcgaacat tctgctcttc aggctggatg atgcggccgc aggcaaaata tccgggtata 4020 

catcccccgt gtttcgcacg tggtcgtcgc cgttcctggg cacagacgga gccaccaggc 4080 

cccatacgct gggctttgga gacgtgcgcc ttgcggatat atacgatgca tccggggatg 4140 

tcccgtcgcc gtcgggcatt gagttttcag atgacggcat gaggatgttc gttacgggga 4200 

tcggcacgcc aggcatcaac atattcacac tgtccgcccc ctttgacata acattgccga 4260 

agcattccgg ctcaaccaac ataggcggcc tgtccgtgtc tgatctggca tttgcaaaca 4320 

atgggaacag cctcacggtg ctcgatgtgg acggggtgtt gcgcgtctac gcccttgggg 4380 

acgattacaa tgtggtcacc ggaaccaccc agaagtttag gattacgctc gataccacac 4440 

agggcatacc caattccatt tacacatctc cggacggcct gtcacagttt gtggcatatg 4500 

atgacaggat tgacttgtac gtgcttggca gcccaaacga catatcgtcg acaaccgaga 4560 

taatcccgta ttcgctgcca aggccggacc cgccaaccgg catggacttt acgccagacg 4620 

ggcgcaggat gttcctgtcc accgagaacg ggatagacca gtacctgctt tcagaaccgt 4680 

ttgcagtcac cacgtcggta tttttgcgca cgatccccat tgacggaggg gcggagggaa 4740 

tacggtttgt agacaacgga aggggcctgt ttgtgccggg cgccgacggc atcatccaga 4800 

ggcacgagct catctacccg tacggggcca gcacgtcgtt gttggagacc gtcagggacg 4860 

gcgtgacgga cggcggtccg ggcgagaacc cggccgccgg agagatccgc cttgcgggca 4920 

cattcaatgc atccgataat gtacagtcgc cgtcgggcat tgagttttca ggcgacggca 4980 

cggggatgtt tgttaccggg tttggggccg cgggcgtgaa tgaattctcc ctgtccgccc 5040 

cctttgatac aaccctcccg gtgcatgtgg aattgcacga tataggcggc cagccggcag 5100 

ttgatctggc gtttgcagaa gatggcagga ccctcctgtt gctggccgcg gatggaacac 5160 

tggatttcta cagccttgcc ggtgatgcct atgatatagg ggaagcatcc cgtacttttc 522 0 

aagtgccgtt tgaggatgcc gcgggtgctg tgcccggcgc cttttaccag cctccggatg 5280 

gctcgtctat tattgccgca tttgacggca ggattgacca gtatgtggtg atccccttcg 534 0 

agttcgtgtc atatccactg acaaggcccg gcacgcccac agggattgac tttgcgccag 5400 

acgggcgctg gatgttcctg tccaccgaga acgggataga ccagtacctg ctgtcgatcc 5460 

cctttgacgt gcgcagcctg acgtatacgg gaaccattcc agtagacggg gtggagggaa 552 0 

tgcagtttgc ggacaacggc agggcactgt ttttggcgga cagtgaaggc ttgatttaca 5580 

attatgacct ggaggacccg tatgctctgg atggcaacac aatttccgtg gaattctcgt 564 0 

ttgacggtag cgtgatgtat gtgctggagt acgacacaaa aagggtggtc tcgtacgagt 5700 

tggagtttcc ctttgacgta tcgagcagaa cacgtgcaga cacgctggac ataccacaaa 5760 

ttgactcacc aagacacgtt gcagtctcga tgcccggcaa ccacctgtac ataacaaact 5820 

cSfSJtSJtttSSJ ggaagatgac accatacact cctatggaat atctaacaat gacatatcgt 5880 

cggcatcata catcggcgag gaaggcatcc cggaacccgt gataaacggg attgactttt 594 0 

ccaacaacgg ccgccgcatg tttctgattg ggggcaacgg gttcgactac caggtgatac 6000 

atgactacat gctaggcaca agatacgaca tatccagcag gagcctgctt gatacatatg 6060 

ccattccagg gccggttgtt tttcccgcgg gccttgattt ctcgtttgac aggctgtcca 6120 

tgtttataat aagcaccgcc ggttcggtat acaggtacgg cctggacgat ccgttcatag 6180 

ttgaaacaat ggactatcag gagtctttcc ggctgcccgt accatcagcg gctgataatt 6240 
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caatatcgga tctggcattc ggcagcagcg gcctgaatgc cgtaatatcg cacgaggggc 6300 

tcgacaccct gtacagcttt gtactggaca tcccgtatgg ggccgaattg gatattgaca 6360 

ggcttgagct tccgctggtg ggggttccga cgggattcga gttctcggac aacgggcgcc 6420 

agttgtacat tggcgcgttt cgtgactctc aatcctcgcc aggcaccctg cctgcgggcc 6480 

tgcagcgcta tgagcttggc ataccatatg acctggcttc ggctgtattt gcgcagtccc 6540 

tgggaatatt cgattttcct cccttcaacg gcatgcgggc caatggcagc ttggcaggat 6600 

tacatgtgcc gcccgatgga agcatcctgt tcagggccgg aaatgccgaa agaaccgtaa 6660 

tcagctatga catggacagc catgatttgg atacattatc attcagggaa tcattcaaac 672 0 

cagatgtcgg acagtcgaca cccaacataa- gggacatgga catatccccg gacggcatgt 678 0 

tcctctacct gcttcaaggc gatgttctgg acatgtacaa ccttacagat agttattcgc 6840 

ttgatgcccc ggcatatgcg ggtaccctgg atttggaacc ggaggatgta atacccaggg 6900 

ggatttcatt ctcacgggat ggcacgagtc tgtttatgac aggcgaagac gtggaccaca 6960 

ttcacgaata tgcattgaat gaaccatggg acatacgcaa tgccatactt gcaggctccc 702 0 

tgtccataag cgcagtgaat ggtgcaccgc gggggctgga tatatcggag gatggcacaa 708 0 

ctgcacatac tatgcgcggg cgtgactttg acacggggcc cgcatccctg gtaaaccaca 714 0 

tattgccagg ccaatattcc ctgctgacgg atgcgccggc gtttgcatac cccgtggagg 7200 

aggagggtgc accgggggat cttgcattct ccgatgacgg catgcgcatg ttcgtggcgg 7260 

gcgtaaacaa ccatttaaga cagtacaacc tgctgtcgcc gtatgacact gaaaatgcag 7320 

aacatttcat ctcgacggat ctgctgactg cggacagggg ccccacgggt cttgtatttt 7380 

cagatgagaa cgactttttc agcacaggcg ccagggccca atttgtgcgc cagtttacga 744 0 

caaaccgccc gtacgacgca tccacaataa cactgagtga caacggactg tacaaggtga 7500 

gcgtggacgg cctgccgtcc ggcatacggt ttacccccga cggcatgaag atgttcatat 7560 

cgggccagga gacggccatg atataccagt attccctgcc gtccccgtat gacacatccg 7620 

gggcggtcag ggacagggtt gagatagtcg cagggctctt tagaaatgca ggtttgtccg 7680 

tcgggttgaa cgagcccagt ccttccggct ttgacttttc ggaggacgga atggagctgt 7740 

acgtgacggg gtcgggcctt gttcacaggt atttcctgcc atcgccatac ggcctcgaag 7800 

atgcagcgta cgggggcagc ttccacacgt tcagggagag cacgccgctg ggagtggtgg 7860 

tgcgggggga tgccatgttt gtggccgggg acagtactga ttccatattg aaatattccc 7920 

tgaacgcaca acctgtcggc aacataaccc atgccgatac gcgcgccggg attgccgaca 7980 

gggcggagat cgtgtttggg gcaatggcag atacgcgcgc cgagattctc gacggcgccg 8040 

atgtagttca taagagtgtg aaaattgacg tattcccaat atcggagggc ataacagtgg 8100 

gcagggcact ttatccagag gacgccgcca tacttgatga cggcgcgaat gccacgcata 8160 

atagggttgt aatcattgtt cacgacataa cagaaggcga tgcgccgtcc atacatgatg 8220 

agccgattgc cgtggggatt tacgccctcg gccctatgga tacaatcgcc gtggttgatc 8280 

tccaccgcct ggccgtatcc gcatccttgt ccgggggtga ttccccgtcg gcctcagatg 8340 

catccggagt agtggccgag agccgcagaa acgcggtgga caggcctggc gtggaagagc 8400 

gcataggaca tggtgtatcc ctggaggcgg ccgacaggcc tgccgtcgac aacatgatgg 8460 

atacggatag tgccggcgtg tacgaccgca gtccggacga cgggcccgcc gtatccgaca 8520 

ggtccgcgct ggggcttgcc cggatggcag ccgacaggcc tgcagtcgat gacatgatgg 8580 

atacggatag tgccggcgtg tacgaccgca gcccggacga cgggcccgcc atatccgaca 8640 

ggtccgcgct ggggcttgcc cggatggcag ccgacaggcc tgcagtcgac gacatgatgg 8700 

atacgggcag tgccggcgtg tacgaccgca gcccggacga cgggcccgcc atatccgaca 8760 

ggtccgcgct ggggcttgcc cggatggcag ccgacaggcc tgcagtcgat gacatgatgg 8820 

atacgggcag tgagagcacg agcaggcttg gaccggttga caggccagaa atagtcgagc 8880 

gccacagcct ggccgcgtct gtatacctgt ccgggggcga ttccccgtcg gtcgcagacg 8940 

gtcatgatgt ggagtccgag ggccgcagag acggggggga caggcctggc atcgacgagc 9000 

gtatagtcat caagatctcg tacagccgcg gcgcagccga tgcgcccaga gtggaggatg 9060 

caatggagac ttccggcgtg accgcgtaca gccgcggcgc agccgatgcg cccagagtgg 9120 

aggatgcaat ggagacttcc ggcgtgaccg tccccaggcg cagtaccatg gacgcgccca 9180 

cagtggccga tgaccacagc ctggcccgga ccgcatccat atccgaaggc gattccccga 9240 

catttgcaga ggcgcgccgc gcggataccg ttggggatat agacgaggtg gacgcgccca 9300 

cagtggccga tgaccacagt ctggcccggg ccgcatccat atccgaaggc gattccccga 9360 

catttgcaga ggtgcgccgc gcggataccg ttggggatat agacgaggtg gacgcgcccg 9420 

ccgtggccga gaggctcctg gcagtcctcg gcctgcaggc ccctgattcg ccgggagtgt 94 80 

gggatactgt aggaatagat cactcggaga tttcaggcga tcctgtgccg gagccaagag 9540 

tagtgcccag gggcggtggc ggtgggggag gcggttcttc gaaccgcggc cttgaaccgc 9600 

atggcggcgg gtatgagatt gactttgagt tccgcataga cggcaggctg gtgctcttca 9660 

atgggacaga cgtgctagcc gaatccggca aggacctgct catccgtccg gtgttccggc 9720 
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cggaggggag tttcaacata tttgatatgg aggtgttgtt taccgccccc ggcggggaga 9780 

tatcgactgc ctactacaac agggctggaa tcctcatggg gattgactgc ggcgagctga 9840 

ttatgaccga tacgacgtat tcatgcgaca tgctggacat attcggagat gagatatacc 9900 

atgtggagag gcttgacgca ttcaacggca tggtcatctc cttggacggc cccctcgacg 9960 

ggacggtcag tgtatcgctt cgtgacaacc acggcatccc gctggcgcag catcggctgc 10020 

ataaatacga gattttgatt ttggacgccg ctgaaaacag acccctgtca gtctcgacgg 10080 

accccaagcc cgtggaggat ccatcgcccg tgcagcatat agagtccctc cagatggatc 10140 

cggagcccgt ggagtccgag cccctcccga tggactccga gcccgtggag gatctggaac 10200 

ctgtgcagca tctagagtcc ctcccgatgg accccgagcc cgtggaggat ctggaacctg 10260 

tgcagcatct cgagcccgtg cagggatccc cgcccgtgca gggagggccg gagtccgtgg 10320 

agtcaggcat agcatacacg ctatggcagt tcctttcagg actgctggat gccctgggtc 10380 

ttgccgaccc ggatgtcgga tctgtccaaa aaacgtcctg atgcgttcaa aaagccggcc 10440 

ctgcccccgt gtgcgcggcc atgcttcaac atgatgggtt tgaagggacc agcccgcggt 10500 

cctgccggca tccaattccc gagatcctgt tacgtgcatc ggccatccct gtacgctgcc 10560 

acatggttac tttgtgtgat catttccggg cagagatcaa gtcattgatt ggaaaactta 10620 

aatcatgcat gggatcgagg gcggccgggg agatatgtcg gagaattttg tggcgttttg 10680 

cgtggcgtgc gccaggggag tcacaaagga cgagatgaag tatgtagacg ggagggtctt 10740 

ccacaaagag tgccatgcaa ggcacggcgg gcagatccgc ttccccaacc cagaggtcga 108 00 

gcagcgcgtg gccgagctga aggtggacct gatacagatg agaaaccagc tggccgagat 10860 

gaacagggcg tcgggggacg gaggggtgca ttccagcgcc acctctgcgg ccgaggccga 10920 

gcagcacagg gccgagctaa aggtacagct ggtgcagatg agaaaccagc tggccgagat 10980 

gaacaggaag gcccccggaa agccggcacg gaaaaaggcc gcaggcaaga ctgcacggag 11040 

aaagagcggc aagaagacgg tgcgcaggaa gaccggcaag aggactgccg gtaagaaggc 11100 

cggggcgcgg aggaagacta cggtcaagag gacggcgcgg aggaagacca cggcaaagaa 11160 

ggcagccggc agaaaggccg gggcgcgcag aaaggccaca gtcaagagga cggtgcacaa 11220 

aaagattgga gtgcggagga agactacggc aaggaggacg gccggtaaga gtacggtgcg 112 80 

caggaagagc acagtcaaga ggacggtgca caggaagacc ggcaagaagg cagtagtacg 11340 

caggaagagc acagtcaaga ggacggcacg gaggccggcc ggcagaaaga cccccggaag 114 00 

ggccgcgcgc agggccggcg caaagaggcg ctagcctgct gattaggaat ttaaggcggg 11460 

cgccgggcag caggtaaatg cagtcgcttg gacggctaga cgaggcgtgc gcggagatat 11520 

cgcgcagcct gcttgaatac gagtccccca ccgccggtga tgtccggacg gagatcagaa 11580 

gggcatgcac aaagtactcg ctccggagga tcccaaagaa ccgcgagata ctggccaccg 11640 

ccaggggtca ggactttgac aggctgcgcc ccctgctgct caaaaagccc gtaaagaccg 11700 

catccggggt ggccgtgata gcagtcatgc ccatgccgta cgcgtgcccc cacggcagat 11760 

gcacatactg ccccggcggg gaggcgtcga acacacccaa cagctatacc ggcggcgagc 11820 

ccatagcggc gggcgccatg aacagcgggt acgacccgga agagcaggtc cgcgcgggtc 11880 

tggcccggct gcgcgcgcac ggccacgatg tagccaagct ggagatagta atagtgggcg 11940 

gcacattcct gttcatgccg caggagtacc aggagtggtt cgtcaagtcc tgttatgacg 12000 

cgctcaacgg gtccgcttcc gcggggatgg aggaggccaa gcaccgaaat gaaactgccg 12060 

tgcacagaaa cgtgggcctc accatagaga ccaagccgga ctattgcagg acagagcatg 12120 

tggacgcgat gctcggcttt ggggccacgc gcgtggagat aggcgtgcag agcctccggg 12180 

aggaggtcta cttgagggtc aaccgggggc acggctacca ggatgtgaca gagtcgtttg 12240 

ccgccgccag ggatgcaggc tacaaggtgg ctgcccacat gatgccagga ctcccggggg 12300 

ccaccccgga aggcgacatc gaggatctgc gcatgctgtt tgaggatccc gcgctcaggc 12360 

cggacatgct caaggtgtac cccgcgctag tagtaagggg cacccccatg tatgaggagt 12420 

attcgagggg cgagtattcc ccgtatacgg aagaggaggt catccgggtg ctctccgagg 12480 

ccaaggcgcg cgtgcccagg tgggcgagga taatgcgcgt gcagcgcgag atacaccccg 12540 

acgagatagt ggccgggccg aggagcggca acctccgcca gctggtgcac aagaggctcc 12600 

aagagcaggg ccgccgatgc cgctgcatac ggtgcaggga ggcggggctc gcggggagga 12660 

ccgtgccgca gaagctccgt attgacaggg cggactattc ggcctcgggg gggagagaat 1272 0 

cgtttatctc gcttgtagac ggggatgatg ccatctatgg ctttgtgcgc ctgcgcaagc 12 78 0 

cctccggagc agcacacagg ccggaggtca caccggaatc ctgcataata cgcgagctgc 1284 0 

acgtatacgg caggtcgctt ggcctcggcg agaggggcgg catacagcac tcgggtctag 12900 

gcagaaggct cgtctcagaa gcagagtctg ccgcccgtga gcttggcgcg ggcaggctcc 12960 

ttgtgataag cgccgtcggg acaaggggtt actatcgcag gctcggatat tcacgcacgg 1302 0 

gcccctacat ggggaaggtg ctctgatgga gacgataggc cgcggcacct ggatagacaa 13080 

gctggcgcat gaactggtag agcgcgaaga ggccctcggc cgggatacag agatgataaa 1314 0 

cgtcgagagc ggccttggcg cgtccgggat accccacatg gggagcctcg gggatgcagt 13200 
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cagggcgtac ggcgtggggc tcgccgtcgg cgacatgggg cacagcttcc ggctcatagc 13260 

gtactttgac gacctcgacg ggctccgcaa ggtccccgag ggcatgccat cctcgctaga 13320 

agagcacata gcccgtcccg tctcggcgat acccgacccc tacgggtgcc acgattccta 13380 

cggcatgcac atgagcggcc tgctgctaga ggggctcgac gcactgggca tagagtatga 13440 

ctttaggcgg gcaagggaca cgtaccgcga cggcctgctc gcagaacaga tccacaggat 13500 

actatcgaac agctcggtaa taggggagaa gatagccgag atggtgggcc aggaaaagtt 13560 

tcgcagcagc ctgccgtact ttgcagtctg tgaacagtgc gggaagatgt acacggccga 13620 

gtccgttgaa tacctggcag acagccgcaa ggtgcggtac aggtgcggcg acgccgaggt 13680 

aggcggaaga aagatcgccg gctgcgggca cgagggcgag gcggacacgg gcggagccgg 13740 

cggcaagctc gcctggaagg tggagtttgc cgcaaggtgg caggcgtttg atgtacgctt 13800 

tgaggcatac ggcaaggaca tcatggactc tgtaaggata aacgactggg tctccgacga 13860 

gatactatcc agcccgcacc cccaccatac aaggtacgag atgttcctcg acaagggcgg 13920 

caaaaagata tcaaagtcgt caggaaacgt ggtcacgccg cagaaatggc tcaggtacgg 13980 

caccccccag tcgatactgc tcctcatgta caagcgcatc acgggggcgc gggagcttgg 14040 

cctcgaggat gtgccatccc tgatggacga gtacggcgat cttcagcgcg agtactttgc 14100 

gggagggggc aggggcggga aagcccgcga ggccaagaac agggggctat tcgagtatac 14160 

gaacctgctg gaggcacagg aggggccgcg gccgcatgcg ggctaccggc tgctagtcga 1422 0 

gctctccagg ctgttcaggg agaataggac cgagcgcgtc acaaaaaagc tcgtcgagta 1428 0 

cggggtaatt gacgggccct cgcccgggat cgagcggctc atagcactgg ccggaaacta 14340 

tgcagacgac atgtattctg ccgagagaac agaggtggag cttgacgggg ccacaagggg 14400 

ggccctctcg gagctggcag aaatgctcgg ttccgccccg gagggcggac tgcaggatgt 14460 

catatacggc gtggccaagt cccacggggt gcccccgcgc gactttttca aggcgctgta 1452 0 

caggataata ctggatgcat ccagcgggcc gaggataggc cccttcatag aggacatagg 14580 

cagggagaag gtggcaggta tgatacgggg gcgcctctga tggtccacga cgtacacaac 14640 

cggcggcgga gcggcggctt tttcctgata atactgggcg ccctcatctt catcacggcc 14 700 

ccgctgtacc tctcagattc gccggagctt ggggcggcag ccatagcctt tggctttgtc 14760 

gtgggcgggg cggggttcta tctcaacttt atcaaaaaga aatcctaggg ggcccggccc 14820 

aagcatttta tggcaagccc ctgtgggagc acccatggga ttgttcagga aaaagagccc 14880 

cgaagaatcc gagccggggc cggacgagcc cgggcccgag gcggagatgg aaaaggtgag 14 940 

ggcggacctt gccggcgtgc agagagacgt ggtcaaaaag tattccgagc tgacggccct 15000 

ctcagagaag ctcgagaggg taaagacaga gtatgattcc accgtgggct cgctgatgtc 15060 

cgagagaaag gggctcgccg aggcgaaaaa agagtccgca tcgctcgagg aggcgcgcgc 15120 

aggcctcgcc gaagaagtcg agcagaagag ggcaaaactc gaccaggcgg agattgacct 15180 

tggggataga aagggcagga tagaggagct cgaccgcgcc cacgcggctc tcgccgggat 15240 

aaaggaggag tcagacaggg gccgcgccga gctgcacgag atcaagcgga agatcctcga 15300 

gtcacagggc gcgctcgaca gggccaggga cgcgcaggcc aaggcggaag cggagctgca 15360 

caattccgcc gaggggctca agtccgcccg ggacgaggcc ggaaagctct cccaggagcg 15420 

cgacaatata cgggccgaga tagatcttgc aaaaaaggag ctaaaggtgg tccggggcca 15480 

gatggagtcg tcccccgagg ggggcgccga aaagcacgtg gtcgaggccg caagcgcgat 1554 0 

ggtctcgtcg ctcacccaga ggctggctgc cgccgagagg gagcttggcg tggtaaagaa 15600 

ggtattggag agggagagaa gggggcgcgg tcaggattcc gaatagatct ttttccactt 15660 

gctcttcatt tcttcctcgc tcattcccag atcgtacagg ggtgatgtga tatcgcggat 15720 

ttcatccacc gttatcttta ccacggaaga tatcccgctc tttatgccgg tctttttaaa 15780 

atgggacgcc atcgtatcga actcttcgcc tgaatcgagc acggcggcag tgccggtaaa 15840 

gcggtagccc tttctgagca gcgggtctat aacgttgatc tctatggcgg ggttggcgcg 15900 

caggttctcg acggtattgg gtgaccttat gttggcaaac gccaggtgcc cctcgtccca 15960 

tacgatcaca gagcccttgg gcgagaggtt cggcttgttg tccggggtta cagtggccac 16020 

aaagcccatc ttgatgcggt tgacgtgatc ccttacctct gttccgatgg tcgccatggg 16080 

ctatccggtc atgcggggtt aaataaagct ggggtgcagc cgcagacact cagccgttga 16140 

ttatacagta ctcgcactcg caggtcttgc tctcgtcatc ttttgccctg cattttgcgt 16200 

gctggcccga ctggcactgg acgcagatct cgtctatgtt gtctatcgag gtgtatttct 16260 

tggactggtc aaagccgaag ccaccgtcct ttggtccaac catacagcgg ttatgccgcc 16320 

gtaccttaat tatctttgcc gtcggggcgg ggccgcttct gccggcggga gccccccttg 16380 

aggccgcccc ccggctcgtt gtcccacaga tgcatgctgc cccggatcat aaacgagccg 16440 

actatgatcg ccacaagccc gcccaccacc agtatgaaca gccttgccag gcccgccaac 16500 

ccgcccatgc gtgtaatatg cggcacccgt aaacaaacat tgcgcgaaga aacaggggta 16560 

ttccaggggc caaggcccgc aagctacatt aatttatcag tccggagggg cgggccccat 16620 

gagcagagtc aaggcgatag ccatctgggt ggccatgata ggcggaggca tagccttctt 16680 



WO 00/18909 



PCT/US99/22752 



-17- 

tttcgcctcg gactatgttg tagaagtaat gaggtaaacg gggaaaacag gatccaatac 16740 

ggtttcttat cctccctgtc catggcggcc ccgttcatat aaagtcccaa cgttcaatgt 16800 

gccagatacg atggtccctc acaggggaat ttggactgtc ccgtgatgtt ggtggtttag 16860 

gaaacgcatt acctaggcat ttatgtagtt gaggtgcacg tccgagggat ctgtttcatt 16920 

atctggatca aataatggaa cttttctctt cattaatttc ctacattccg cctgcagttc 16980 

aagcatgatg gttacatcat cacgttcgat tatccgttct gttgtttcgg ttttttgctc 17040 

actcccatcc atgccaatat acgacggcat ttgtctaaaa aggtttccat cttcaagagt 17100 

cgtgtgtttg ccgtgcaatt acctcccagg ggagcataaa aaataagcca taaacgatgc 17160 

acagcatccg ccatggacct atgaaaaaac aagtcgtcag tggcgcacag tatccgatac 17220 

aacctctcaa tcgtgtcatc cggcaactct gcaccatgga aactccggac gatcgctatt 17280 

ctatacatgc tggttatcgt acctgtgaga atcctaatca tcctacggct atgttgatgg 17340 

tttgctgata actgaattac cgaggtgatt cgcgatgatt ttgttgattg gagtaaattg 17400 

agacaggata tccgctgcat gagataccga caatgtcaga tcctgaaatt cttggtcggc 17460 

ttctagatca gttattttca aaccgatgcc cctacactcc tctcgcgata tgtatctccc 17520 

atgactgtaa tattttccag gagaagctaa cattccagat attttttttg atttttctgc 17580 

cgcatcagac tcgccagcaa acatgtggtc ttccagccat ttttgtacaa gcacttcagc 17640 

tagtttctgg ctgctaatgc atttttgaac cagtccagga ggatattgtc ctaacaatgg 17700 

aagccatgcg ccaagcctgc ccggatgttt ttcagatacc acctgcactt cttgcaactc 17760 

gtcaattaga agctgtgcag acattatttg catgccaatc ttggttggga aaataaattg 17820 

gggatcagcg ggtcctatag acgagtgttt gcccattacc agggaatttg atgcgcaagc 17880 

aagcatcgag gctgctgaca ttgcggcata cggtatgatg acccgaatat catcatattt 17940 

cgcatgaagg tatgtgacaa tcgattctgc agactcggca gaacctccag gactgtggag 18000 

tatcagatcc aattttttag tctttaaatc acgcatcatc ctcataaatc catacaggtc 18060 

accatttgtt atgagagctt cattaggcgt atgcggttcg tccgtcatcc agttggtcgc 18120 

atatagaatt gtgtccctcc ccgaatactg ttgcaatttt gaaagatagt cgtgaagtac 18180 

cacaccaggt gcctgctcac cggctgactc cattagtcga agtactttgt ttgatatttc 18240 

tgcgtagcta ggcatttatg tagttgaagt ggctgccggc gggaattaac ccatagtctg 183 00 

aatcgtatga cttgcctttt ttcttcatca atttctcata tacctcatgc aggtctagca 18360 

ggatggccat gtcactgcgc ttgcttggta attttgttgt ctcgggcttt tggtcgatat 18420 

tcttctccat gccgataaaa gtcggtgttt ctttaaaaac gttcatgatt ttaattttcg 18480 

agtctttgcc gtgcaatttc ctccagagat cataaaaaat gagtcataaa cggtgcacag 18540 

catccgctgt ggacccatga aatgggcccc cggcggtgca cagcatccgc tgggggctca 18600 

ataaaaaaaa tgagtcatca tgcatagtct ctatgtaaat ggctgaaccg gtgttttggt 18660 

cgattagtaa aggctggctc tccactcgcc gaagcttgtg ggttacacca ccttcctatc 18720 

aacgcagtct tcttctgcga accttcatcc gaagaaggaa tatcttgtct cgggatagga 18780 

ttcgtgctta gatgctttca gcacttagcc tagatggctt agctgcccgg cctgccctgt 18840 

cggacaaccg gtagaccagt ggccacgcct ctctgttcct ctcgtactag gagcgacttc 18900 

ccctcagata ttcgcgcttc catcaggcag aggccgacct gtctcacgac ggtctaaacc 18960 

cagctcatgt tcccttttaa taggcgagca gcctcaccct tggcccctgc tgcaggacca 19020 

ggataggaaa agccgacatc gaggtaccaa accgcggggt cgataggagc tctcgcccgc 19080 

gacgagcctg ttatccctgg ggtaattttt ctgtcacctc cgggccccaa tagtgggcac 19140 

acgaaggatc gctaagccag actttcgtct atgaattccg tgcgtttgga aatccattca 192 00 

gtctagtttt tggctttgcc ctcttcagcg gatttctgac ccgcttgaac taaactttgg 19260 

gcccctttga tatcttttca aaggggtgcc gccccagccg aactgcccac ctgcacgtgt 19320 

ccccggtctt caccgggtaa gtggcactgc aggaaatgtc tggtgttaca tcggcgtccc 19380 

ctgacgtccc aaagaacgcc aggaaaagac tcccagatac gctatgcact ccctgctata 19440 

ccacaagcac aagctgcagt aaaactccac ggggtcttct ctccccgatg gaagatgatg 19500 

gactgttcgt ccaccttatg tggcttcacc gggttgtagg cggggacagt ggggctctcg 19560 

ttgttccatt catgcacgtc ggaacttacc cgacaaggca tttggctacc ttaagagagt 19620 

cagagttact cccggcgtta accggtcctt agctcggttg aacccaagtt ttagataccg 19680 

gcaccggcca ggattcagcg actatacata ccctttcggg ctagcagtcg cctgtgtttt 19740 

tattaaacag tcgaaacccc cttgtcactg caacctgctg ccgccattcc tcatgacagc 19800 

tgcaggcatc ccttatacct aagctacagg actaatttgc cgaattccct cgccatacgg 19860 

tatacccgta gcaccttagt ttactaaacc agcgcacctg tgtcggatct gggtacgaac 19920 

ttgcagtttg ctagccgcac ggtctttcat ggtctcctgg agtcggggga actctgctaa 19980 

cgcaaagcca ctcccgcctc gggcctgttc tcgtcattac gacactccca ggccctcgaa 2004 0 

cggttcgaca cgacgacggt catgctcccc ctatccggaa gcgaaccatg cggttcaaat 20100 

gctccctgca aggtaccaga atattaactg gtttcccatt cggactactc tgttgaggca 20160 
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gcccttagga tcgactaact ccaggctgac gacgcattgc ctggaaaccc ttgcgctttc 20220 

ggtggtgcgg attctcaccg cactatgctg ttactgccgc caggatctgc aatagaaatc 20280 

ggtccacagg acgtcaccgc cctgcttcgt cccaatcact acgccaacct accacggtgc 20340 

acctgccacg gtgcacgtcc ggagtatcgg tactctgctt tagccccgtc cgtttttgtg 20400 

gcgccctcgc tcggcaggta agttgttaca cactttttga aggatagcta cttctgagct 20460 

tacctccctg ctgtcttggc gacgacacgc actttggctt gacacttagc agaaatttgg 20520 

ggaccttaac tccagtctgg gttaaacccc tctcggtcgt gaaccttacg tcacacgaac 20580 

ccgtgtccat gcttctgcga tgtgtatccg ttcggagttt gaatggatgg tgaggaatct 20640 

cttccccgcg ccaccctatc agtgctctac cggaaacacc atctccacat agcacgccct 20700 

gcgagacgct tcggttggaa ctagcaagcg ccagtctaga ttggtttttg acccctattc 20760 

ccaagtcaca caaacgagtt gcacgtcaga actgctgcag acctccagtg ggctttcgcc 20820 

caccttcatc ttgctcagga atagatcgac tggcttctag ccttaccgcc atgactcaac 20880 

gcactttcac acgcttctcc tcacaatgct gcgagaattc ggtttccctt cggctacgcc 20940 

tttctaggct taacctcgcc atgacagcaa gctccctggc ccgtgtttcg agacggaacg 21000 

cacgacactg acgacatgag ctccggactt ttagctccat tgctggaacc tccggtccga 21060 

aaaaatcgtc tttcatgccg tgcacgtctg taagcaatag gtttcatgca cttttcaccc 21120 

cccttccggg gtacttttca gctttccctc acggtactag tacactatcg gtcttgagag 21180 

atatttagcc tttgatgcta ctttcaccaa tcttcgctgc ccactgccaa ggacaactac 21240 

tcgggtgctg gccctgcccc attccacttc gtctaggggg gtatcaccct ctaagccgga 21300 

acatttcaga acactttgac tatttcgtgg ggccattgcg ccgcaccaaa acaccacatc 21360 

tcggccgcgt taccgcggca gattcagttt gggctctttc cttttcgatc gcctctactt 2142 0 

gggaaatctc tattgatttc tcttcctcgt ggtactaaga tgcttcaatt cccacggttc 21480 

gacctccgct tgcgcggagt atacaggatt cctattcgga aatctcggga tcaacgggtg 21540 

cgtgcaccta ccccgagctt atcgcagctt gccacgtcct tcctctctcc tcaagcctag 21600 

caatcctcct attgccgtct ttacaccggc atattcagcc acatattaca cgactatgca 21660 

tgatgatcat cgcggtcccc aggggagggg cccgctacat ccttcatacg ccactttcgt 21720 

gacgcattgc accatgtgaa gatatgtgca ccccgttcaa accagtttct aaggaggtga 21780 

tccgaccgca ggttccccta cggtcacctt gttacgactt ttcccttgtc gcttacctca 21840 

agttcgataa cgccaattag acgtcacctc actaaaagca aacttcaatg aaacgacggg 21900 

cggtgtgtgc aaggagcagg gacgtattca ctgcgcggta atgacgcgcg gttactaggg 21960 

attccagatt cgtgagggcg agttgcagcc ctcagtcata actgtggtag cgtttgggga 22020 

ttacctcctc ctttcggata tggaacccat tgtcactacc attgcagccc gcgtgtggcc 22080 

ccagagtttc ggggcatact gacctgccgt ggccctttcc ttcctccgca ttaactgcgg 22140 

cggtcccgct aattcgcccc actgctcctg agagcaatgg tggcaactag aggcaaggat 22200 

ctcgctcgtt acctgactta acaggacatc tcacggcacg agctggcgac ggccatgcac 22260 

cacctctcag cttgtctggt aaagtcttca gcttgacctt cacactgctg tctctccggg 22320 

taagatttca ggcgttgact ccaattgaac cgcaggcttc accccttgtg gtgctccccc 223 80 

gccaattcct ttaagtttca tacttgcgta cgtacttccc aggcggcaaa cttaacggct 22440 

tccctgcggc actgcactgg ctcttacgcc aatgcatcac cgagtttgca ttgtttacag 22500 

ctgggactac ccgggtatct aatccggttt gctcccccag ctttcatccc tcaccgtcgg 22560 

acgtgttcta gtagaccgcc ttcgccacag ggggtcatca atagatcaaa ggattttacc 22620 

ccttcctact gagtaccgtc tacctctccc actccctagc cgtgcagtat ttccggcagc 22680 

ctatgcgttg agcgcataga tttaaccgaa aacttacacg gcaggctacg gatgctttag 22740 

gcccaataat cctcctgacc acttgaggtg ctggttttac cgcggcggct gacaccagaa 22800 

cttgcccacc ccttattcgc cggtggtttt aagaccggta aaagatttct ttagcagaaa 22860 

acactcggat taaccttgtc gtgctttcgc acattgcaaa gttttctcgc ctgctgcgcc 22920 

ccatagggcc tgggtccgtg tctcagtacc catctccggg cctctcctct cagagcccgt 22 980 

atctgttata gccttggtgg gccattacct caccaacaag ctgatagacc gcagtcccat 23040 

cctacggcga taaatcattt gggccacaaa ccattccagg catggtggcc tatcgggtat 23100 

tattctcagt ttcccgaggt tatccccgtc cataggttag attgactacg tgttactgag 23160 

ccgtctgcct tgtattgcta caatgactcg catggcttag tatcaatccg atagcagtca 23220 

ggtccggcag gatcaaccgg attcttattt ggattatttt ttttttcaaa gtacgcctgt 23280 

acttttggaa ttgaacggaa tgcacataat cttcacatct cagatatgac ccttcgatca 23340 

gatcctcatt ctgtgtgcgt aactggaggc ctgcgaatca caaaatggta caataccatg 23400 

gcttcatcgc aagcgccgct cttgcgtcac gtacgatcgg atcgccttgt ccatgggcat 23460 

ataaaccatc gccggtttcc gggcccgatc ggacccttga tcggcccgcg gggggcgatc 23520 

cggcctcatt aaattacggg gggtacaacc tgctgccgtg gatctagagc gcgagtacag 23580 

ggcaaagacc aggggctcgg cggggatatt tgcccggtcg agaaggtacc atgtaggggg 2364 0 
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ggtcagccac aacataaggt actatgagcc gtacccgttt gttacaaggt cggcgcgcgg 23700 

caagcacctt gtggacgtcg acgggaacaa gtataccgac tattggatgg ggcactggag 23760 

cctgatactc ggccacgcgc cggcgcaagt aaggtcggca gtggaggggc agctgcgccg 23820 

cggctggata cacgggaccg caaacgagcc caeca tgcgg etcteggaga teatacgegg 23880 

ggcggtaaag geggcagaga agataaggta tgttacatcc ggcaeggagg ccgtcatgta 23940 

tgeggcaagg atggcgcgcg cacgcacggg aaaaaaagtg atagcaaagg tegaeggegg 24000 

ctggcacgga tacgegtegg ggctgctaaa gteggtcaac tggccgtacg atgtgcccga 24060 

gagegggggg ctcgtcgacg aggagcacac cgtgtccatc ccgtacaaca atctggaggg 24120 

atccctggag gegctaagge gcgcaggggg cgaccttgca tgtgtcatag tegagecgat 24180 

gettggegge ggeggctgea taceggcaga aceggactat ctccgcggca tacaggagtt 24240 

tgtgcattcg aagggtgcac tgttcattct cgacgagata gtcacggggt teeggttega 24300 

ctttggctgc gegtacaaga aaatggggct ggaccccgac gtggtggcgc tgggaaagat 24 360 

agteggggge ggattcccca taggtgtggt gtgcggcaag gacgaggtga tgtgcatctc 2442 0 

cgataccggc gcgcatgcaa gaaccgagag ggegtacatt ggeggeggea ccttttctgc 24480 

aaaccccgcg acgatgactg cgggtgccgc ggcactcggt gcactcaggg agagaagggg 24540 

cacactatac cccagaataa actccatggg ggacgacgea agggegegge tctcgaggat 24600 

attcgaegge agggttgcag tgaccggcag gggctcgctg ttcatgaege actttacacc 24660 

ggatggggcc cgcaggatat ccagcgcggc agatgetgee gectgegatg tgeatctget 24 72 0 

gcacaggtac cacctggaca tgattacaag ggaeggcata ttctttctgc caggcaagct 24780 

gggggecata tctgccgccc actcaagggc ggaccttggg gecatgtatt eggegtctga 24840 

gcgctttgcg gggggactgt gagttatacc catgggaaac tttgattata egggegtaca 24 900 

ttcccggggc ccatgatact etteggcaag agcgacccct ccgacctgct ccgccaggcc 24960 

gatcttttgt gcagtgggaa caagtacaag gcggcagtgg gcctgtacag caggatactc 25020 

aaggacgacc cgcagaacag gatggtcctg cagagaaagg gcctcgccct caacaggata 25080 

agaaggtact ctgatgecat aacgtgcttt gatctgetge tcgagctgga tgatggcgac 25140 

gcgcctgcat acaacaacaa ggccatagcc caggccgagc tgggegatae ggcatccgcc 25200 

ctggagaact atggcagggc catcgaagcc agccccaggt acgcgccggc gtactttaac 25260 

agggccgtcc tgctcgacag geteggegag cacgaagacg cgctgccgga cctcgacaag 25320 

gegacaagge tggacaggga caaggccaac ccgaggttct acaaggggat agtcctggga 25380 

aagatgggcc ggcatgeaga ggcgctgtcc tgcttcaagg aggtgtgcag ggcggaccac 2 544 0 

ggccacgccg actcacagtt ccacgtggcg atagaggtag ccgagctcgg caaacacgcc 25500 

gaagccctcg gtgagcttgc ggcactgccc gcagagtacc gegagaaege aaacgttctc 25560 

tacgcccggg cgcgcagcct cgccggcctg gacaggtacg acgagtccat tgcacacctg 25620 

caaaaggecg ccagaaagga ctccaagaca ataaaaaagt gggcccgcgc egagaaggee 2 5680 

tttgatcata tacgggatga tcccaggttc aaaaagatag cegggtaaac cctacagcat 25740 

cttttttctt gccgcgtcta teegcattat ccggaccttt tttttgggcc ccacaagccg 2 5800 

egactegtag acaggggcat acacttcttc gaccgatctg actgeaaact cctcccggag 25860 

gtcgcgcatg ccgtcaggcg ggggcccgcg gagctttacc cggagttttt ccagcgccac 25920 

cccggtgtct atctccgggc gccgcacatt ctccgacgaa teaagcatge gecgegggta 25980 

cggctcgacc gcgccggtcc ccgtcttgta gggaaagccg gtctccccgc cgtgccggtc 26040 

aaggcacatc acgccctctg attccgegta tacgtgctcc tcgagttcca gatcgaccct 26100 

gttcttgccg cgcccggcgg acagggtctt gegtatgege gactttgacc ttatcgggaa 26160 

gacgccgtcg cccagcacca cctcgatcac gttctgatcc accttgatcg ggtgcacggc 26220 

ccttctgaaa aagtcggccg agtaceggge ggagacgegg atgagegect cgtggacgag 26280 

ecttaeggaa tgtacatgga cgtcctcctt tgggggegee ctcatcatgg ccctgaatgg 26340 

ccccgtcttt tttgectega tcatggcccg ggcggactgt tcagtcatgg cgttgcgcag 26400 

gacgatgatc cttgtcttgt cagtcactcc cggcaggccc cctcatccgg catgccctgt 26460 

acacacggca aggtaataat agcctgccgt ccgtacctgc cgtatgaggt cagaagagag 26520 

gccgggtcac attgaaaagt tcctaaagag ggcggacaag gcgatcgaca gegeggtega 26580 

geagggegtc aagagggccg acgagatact agacgatgea gtcgagctcg gcaagattac 2664 0 

ggtgggcgag gcgcagagga ggagcgatgt getgetcaaa caggccgagc gggagagcag 26700 

gcggctcaag tecaagggeg ccaaaaagct cgaaaagggc ataggegecg caaaaaagat 26760 

ggcagcaggc aagggcgacg cgctcgagac getegcaaag ctcggcgagc tcagaaaggc 26820 

ggggatcata aeggagaaag agtttcgcgc caaaaagaaa aagctcctcg cagagatctg 26880 

acatgaaggc cataacctac tcccgggacg gctccgcaaa ggaggtcaca aagaggtggt 26940 

ttgtcggtac tccttcactg atgaaccttg caggcgacct tggcatgacc gagagtgaca 27 000 

tattccatgt gacatttccc gacggcgcca agacgaccct gcacacacac gaaggeggge 27060 

agetgetgat agtcacctcc ggaaegggea gcatgtcggt ctttgaaaag accggcggcg 27120 
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gggataccga ctttgcgata aaagagaccg accgcatacc gctaaaggag ggcagcatac 27180 
agtacatacc ggcgggtaca ttgcacgtgc acggcgccat cgagggcacc accctctccc 27240 

acatagcggt aaactatccc tccccgtcgg gaaaggagcc gtataccatc tggtacgaat 27300 

ccgactttgc gaacagggtc accggcgtgc tataagctac tttagccgct ccagtatgga 27360 

caggctcaca aggttggtca tcgttatccc cttgcggatg acctcgcgcc gcgtcttctc 2 7420 

gcagtacttg ttgacgctct ggtagctctt ctcgctggag acctcgagga ctattatgtt 27480 

cccgctgtac ggaaaggcaa actttggcgg caccctcgag cagacgtcca tgcgccctat 2754 0 

tcccgccctg cctatcttct cctttctccc caccacggac gccacctcgc atatgtcctc 27600 

ctcggaatcg ccgtactcca gatagtacac cgatacatag ctcaccccgg gggactccct 2 7660 

ctcgaatatg tcctcgtcca tgctaaacac ataggatgcg ccctcccccc ggcccaccgc 27720 

cttgatcatt ctgggggcgt ccttgaacct tgccagcagg tagtggttgg cctgcggtgc 27780 

aaagtacggc tggctctcgg acgagaacgt ggcggtcaca aagtacatct cgcggacgtt 2 7840 

cctcgacgag gcccacgcgt ccttgagctc gatcgaggcg tgatcgtgcg tgtatggcgt 27900 

gcaggcatgc ccgcccgggg cccccgtctg ggacatgcat tgtacccgcg ggaccggtta 2 7960 

ttatctagtt ccatgggggc gcagggcgcc gcccccgtgt ggcatgcgtg gatccatgcg 28020 

atagttattt aaaactagga tgccgggcac ccgtcgtccc aagctagctc agcctggtag 28080 

agcttccggc tgtagatgtc ggccttggct gaccgtataa cagcatatca ggcatacaga 28140 

gaccgggttg tcgaaggttc aactccttcg cttgggacca cattataacg gctgccgcct 28200 

catgcggctg tgcacggcat ccgtacacgt tccatgcacg ggtgccgcgg cgtgccatat 28260 

gcatggatgg tgcatgtaca atgcacgggt gccgcgacgc actatacgca tggatggtgc 28320 

actatagatg cggctaaatg tgcacggcag agccgcaggg cccgggccgc gtgcacctat 28380 

attctgccct gtcccagggt caggagccgc gtcgccagaa cgatgtgcgg cgcccgcgcc 28440 

gcacggcggg gccgccgggg gcgcacgaca ccgcatcgcc ggacctggcg ccatttctct 28500 

ccagcagtgc gtcgagcgca gaatcgtcga ctagcgtgcg ggatggcagc ccgcccgggg 28560 

taggcacgcc ctgcagctcc attgcccggc gcatctcctc gttctccatg cggaggcggc 28620 

gctccttctc ctcgagcctg ctgctcatcc tgtcgaggaa cattacatct gaattgtaca 28680 

gctccctttg ctgccccatc atgcagagga tatggcgcag ccttgacgta tcgcatatac 28740 

cggatgccat gagctccctg accccgtcct ccgtgtggcc gatgtgccgg ccccccgccg 28800 

atgccctgcg cacgccggac ctcatcgctg cttcctcagg tactcccgga ctatcctgtt 28860 

ggcaagccgg gtctcgtcgg tgccctcgcg ctcggccacc cgggcaagct ttcttgcgcc 28920 

gttcttgccg agctctatgg ggatcttctt cctgacgccc gtcttgcgca ccctctttag 28980 

caggatcctg tggcccgagc gggggctctg ggctagcagc ctcaggtaga tccgcctcga 29040 

cggccgatca agccttgata tgttcagcgc caccttgagc gcctgggaga cggtcgggac 29100 

ggcctggtac agctttgtcg cctcgtcccg ggatatggtc ccggggacta gcgccttgat 29160 

cttctccggc acgcccgcaa agccgtggta ctttttgaac gtgggcatcg acatgccgag 29220 

cttccttgcg gcctcggcgc gggtcatctg ctcggcgaga aacctgcacg cgtcggcgag 29280 

ctcccggggg ctcatctgca tccggtgcag gttctccacg accgatgccg cctttgcgtc 29340 

ctccaggccg tactccgtat ccttggttat cacaagaaac ttggactttt ttgcgcccag 29400 

atgctttagg gccgcaagcc tgtggttccc cgatatgagc aggtacagcc ccctgccgcc 29460 

cctctgtatt acgggcgggt tctgcaggcc ctcggacctg atcgactttg caatctccct 29520 

caccctggac ctgtccagcc tccttgcctg cgcgtccttc cacacgtgca cgttcttgag 295B0 

gggcacctcg cgtaggacct gctttatccg gggcttgtag cgccgagcca acgtactcta 2964 0 

caagatacaa atccttgtta actgtgtttg gtaagtttat cacaacaatt aggttagata 29700 

gagctgttcc cccgcaggcc cccgtgcacg tactctatcg cgcagccccc cgggggacag 29760 

ccggaaccgg gggctgccgg ggcgggatcc cgggcgtcga tagaataaat acgcgcgggg 2 9820 

ccgcggtgcg atcgcccgtg ctgataataa actgcaaaaa ctatgaggag gccgccggcg 29880 

gcaggatccg cgggctggca gatgccgcgg ccggggctgc cgccaggtac ggcgtcagga 2 994 0 

tagcgatagc cccgccgcag cacctgctgg gcattatagc aggccgggat cttggcgtgc 3 0000 

tggcccagca tgtcgacgac aaggggacgg ggagcaccac agggtatgtc gtcccggagc 30060 

tgctaaaaca gtcgggggtc tccggggcca taatcaacca cagcgagcac cgcgtacccg 3 0120 

cggaccaggt ggcgggcctg gtaccaaggc tcaggggcct tggcatggtc tcggtggtct 30180 

gcgtcaggga tcccgccgag gccgccgatc tctcccggta ttgccccgac tacatagcga 30240 

tagagcctcc cgagctgata ggttccggca ggtccgtctc gacagagagg ccccaggtca 3 0300 

tacaagaggc cgcagaggcc atcagggggg ctggcggcgt aaagctgctc tgcggggcgg 30360 

gcataacctc cggggcggac gtgcgcaggg ccctcgagct tggctccgag ggcattcttg 30420 

tggcaagcgg ggtcgtaaag tcggcagacc ccgcaggggc catcggggag cttgcccggg 30480 

ccatgtcctg acgcaccatt ctaggcgccc gcgccgttga ggcgcgccag caggtcaaac 30540 

gacgacctgt atagctcgcc tggcgaccgg gcgcccgcta tcaccacctt tcccgacgca 30600 
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aacacaagga agctgcagct gcccagcccc tttagtatca tgccgggaaa cgaccccggg 30660 

tcgtacaccg cgccgggtat ccgcgacgat atcctgtcta tgggaacagt ccgtcctgca 30720 

tccactgtcg ccaccatatt gcgtacgacg ggccttgtac acccgccggc cgccgccccg 30780 

ttccggaaca ggtgcagccg ggcctcgtgc agctgcgcaa acgatgccct cacggagctg 30840 

gcgccgacgg atatcatctt gcccgagaga aacaccgtca cgcgcccccg catgccgggt 30900 

gttttgatat agccgcacct gccgccgtat accgcctcgt cgtacatgca gcatggcatg 30960 

gcggccatct tttttgcgcc caccctccgg cccaggtcgg cggtactcac aacgttgacc 31020 

accctgggcc gtttccttgg atccagcatt gatccggtgc ccggccgcac tctattttaa 31080 

ccggggcccg ggcggcagcc ccgcggccct gtcgtacctg cgctttagca gcattacggc 31140 

ggccatcagc gccaccccgg ccacgttggt ccctattatg tagacgtccc atatgtgcac 31200 

gccgtaggct atccagagca tggcccccgc gcctatcagc atggtcagat accacgatac 31260 

atccctgagg ctctttgtcc tgtacgcctt gactatctgg tgcacccatc ccgacagtat 31320 

cagcacgccg ccggcagccg ccacgatatc cagcagggcg atatccacgt tcatttgaaa 3138 0 

aagaactgct ccatgccggt ctgctttggc ttgccgagta tctcgtcaaa gtcaaggccc 3144 0 

atggacgagg tgagctggtc gagcgtcgac tccatgaact cgaggtactt tgacgtgtcc 31500 

acctcgcctg cccgggccat ctccaccggc ttgacgccgg tcttgttcat cacctttacg 31560 

tacgatatta tgtcgccctt tttgacctcc cttgcgttct ccagcagcct tgccgcccgt 31620 

atgtgctgcg ggacggtctt gacatattcg gagggcgcct tgcttatcat cacattgaac 31680 

gccaggtcca cgagggggat ctgcctctcc tcgagcctct tgccgcacgc ggcgatcgcc 31740 

tttgagatcc tcatcttggc tgactcgaac tcgtcctcgc tctcgactcc tgagagtatg 31800 

tcgagcagcg agtagaagag ctcctttatg aacgggggcg tgtgcgactt tttgcccgtc 31860 

agccccttga cgtcgacctt gcctgcccgg gtcaccccga aatagttttt tttcctgttg 31920 

gatagcacga catacctgta ctctttgtcc acttcgagct ccacaccgtg ctccttcttt 31980 

gcatgctcga ctatctcgtg gatctgcctc tcttcgggat cctttatgaa cagagaatcg 32040 

gtgtccccgt acagcaccct cactcccatc tgctcgcagt gcgatatcgt ctgcatgatg 32100 

atatagcgcc cgacagcagt ggtggcctct gccgcgggta aaaagtacag cgggaatatc 32160 

tcggcgccca tcacgccgta gcttgcgttg agcacgacct tgagggcctg gctgattacg 32220 

gtatactgct gccgctgctc ctccgtaatg gatgtgctct ttgagaggct cttgtaatag 32280 

ttgacgcgca ggtcccgcag cgagccgatt atcatcgatg tcaggccgtt gttttttgta 32340 

catacccagt ggttggtatc ggggatggtg ttctttttgc attctgcatg cacgcaccgg 32400 

acggtctcgt acgagaggtt cctcaccttt atgatactgg gatacaggct cgcaaagtcc 32460 

atcaccgtaa catcaaagtg tatgccctct tcaggctcga cgacaaggcc cccgcggaac 32520 

tttttatcct ttattaccgc gtcgttgctc acctcgcgcg acctgccctc cagctcgtcc 32580 

ctccgcggta tgagcgcgtt tcgctgtctg tgctcatagt acagcaggct gcgtatccac 32640 

tgcgagacgc ccatgcggga catgtcatcg atgggcatcc gggctattct gctggtcacc 32700 

accagcaggt ccatgagtat ctcgttgcca aaggtgctaa gctcgagcgt caggcgcgcg 32760 

tcgtgatagc aatagtttgc agtctggtat aaggtgagat cccccagttt gaccccatag 32820 

tcgaccttgc cctcgccgag catcgccttt gtgacgctgt taagggaata gtccgtgtac 32880 

tttgccgcaa aggcgtacag ctggaatgac ctgttcgaga aggtcctgta caggtccagg 32940 

tggactccgt gccggagcgt ggcagaatcc cgcatcatgt acaaaggaat gtcagagtca 33000 

gatactccga ggcgccgtgc cctgttgagc atgtacggca tgtcaaagtc gtcgccgttg 33060 

tacgtcagaa caaacgggta cgagcctatt accgatagcg cgtcgcggat catgtcagct 3 3120 

tccttgtcgt agaataccac ctcgacaccg ggggtcacgc cgttctcgcc ctcttctgcg 33180 

ccgctcctca ggacgaatac ctgttttagg ccgtcggtgg cggcaaaccc caccgccgta 33240 

accctcctgt cggatatctt ggggtcgggg atcctgccct cctctgaatc cacctcgata 33300 

tcaaagctga ggcgccgtat cctgggtatg ggctggttga gcaggtccgc ccaccccgct 3 3360 

atgaactcgc ggaactcttt tctgtccgcc atgccctcgt ctacaacctt gtcccagagg 33420 

aggctcttga gggccagctt tacctcgtcg gatatgggca tgtcatgcgg gattaccttg 33480 

ccgccggata ccgaatagta cctgcccacg accaggctct tgtcgtacag atagttctca 33540 

tagtacttta tgtcggattc ccacgtgtcc atgatgttgc ggatgctctt ctccgagttg 3 3600 

gtcccgccta tggcaagggg gtcggccaca gttatcttgg tgacgggcac atccttgtcg 33660 

gctatcaggt cgtgccgcat gacctgctcc gttcctagca catcctccct gccttcaagc 3 3720 

tccccaagct cggagggggg ctgcctcgta tagcagtagg gcttgtgccc cgtattgtcc 33780 

gtccagtgta cgatcttttg tgattccggc tcgtaaaact tgaggacgac cgcccctgcc 33840 

tggctgtcgt atgttgcaga taccagcagc gacgggggta tctctacggc atcttgcacc 33900 

gtcaccgcac cggcacctcc ttgctgcctc cgggcatcct tgacgcccag tacgagatct 33960 

tttccttgtc catcatggtt atctcggagg atgtctcttt taccagccgg gaggtgttct 34020 

cggggtaggt atcaaggcag acaaaccgcc tgatccctat cgttacggcc atcttggtac 34080 
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actccagaca cggcgagaac gtggtgtaca tggtggcccc cccgcccccc gcgcctatcc 34140 

cgagtatcgc acagtgcatt atagcgttgg cctctgcatg gttgcacagg caccggtcca 34200 

gggcctcgcc tgacttgatc ctgccctcga tgcgctcggc acacctctcg cagccgccct 34260 

cgtagcagtt cttgacgcca ggaggcgtcc cgttataccc tgtggcgagc tgccggtggt 34320 

ccctcactat tacggccccc accttgcgga ctatacagtt ggatcggagc tttgcaagct 34380 

ccgcctgcag catgaaatat tcatcccagg tagggcgctc aaagccgctc acgggcagcc 34440 

tgcccccgcc cggcatatta tggtatatgc gggacggggc cgtccacccg cacccccgta 34500 

tatggatctg cgatcagggg gtagaaacca taaaacaaca ggccgcggca gggcgcgcgt 34 560 

ggagactggg cacataacgg gcaggtacat cgagcccggt gccgtcgaga ggcgcgacta 34620 

ccaggtgggc ctggcggaac aggccatacg ggagaactgt atcgtggtgc tcccgacggg 34680 

cctcggcaag actgccgtcg ccctccaggt gatcgcccac tatctcgacg agggccgcgg 34 740 

ggcgctcttc cttgccccta caagggtcct ggtaaaccag caccgccagt tcctgggcag 34800 

ggcccttacc atatccgata ttacactggt cacgggagag gacaccattc cccggcgcaa 34860 

aaaggcgtgg ggaggcagcg tgatctgcgc cacgcccgag atagcaagaa atgatataga 34920 

gcgcggcctg gtcccgctcg aacagttcgg cctggtcata ttcgacgagg cccacagggc 34980 

ggtgggcgac tatgcctatt cttccatagc gcgggcggta ggggataact ccaggatggt 35040 

gggcatgact gcgacgcttc ccagcgagag ggagaaggca gacgagataa tgggcaccct 35100 

gctctccagg agcatagccc agaggacaga agacgacccg gacgtaaagc cctatgtaca 35160 

ggagactgcc accgagtgga taaaggtgga tcttcccccc gagatgaagg agatacagag 35220 

gctcctcaag ctggccctcg acgagaggta ttcctccctc aagaggtgcg ggtacgatct 35280 

tggctcgaac aggtcgctct cggcgctgct ccggctgcgc atggtggtgc ttggcggcaa 35340 

caggcgcgcg gccaagccgc tgttcactgc gatacgcata acgtacgcgc taaacatatt 35400 

cgaggcgcac ggggtcacgc cctttctaaa gttctgcgag aggacctcca agaaaaaggg 35460 

cgtcggcgtg gcggagctgt tcgaacagga ccggaacttt acaggggcca tcgcgcgcgc 35520 

aaaggccgcg caggcggcag gcatggagca tcccaagata ccaaagctcg aggatgccgt 35580 

ccgcggggcc cggggaaagg cgctggtctt tacgagctat cgtgattctg tcgacctcat 35640 

acactcaaga ctcaaggcgg ccgggataaa ctcgggcatc ctgataggaa aggcgggaga 35700 

aaagggccta aagcagagaa aacaggtgga gactgtggca aagttccgtg acggcgggta 35760 

cgacgtgctg gtatcgacga gggtcggcga ggaggggctc gacatatcgg aggtcaacct 35820 

ggtgatattc tatgacaatg tgccaagctc gatcaggtac gtgcagagga gggggagaac 35880 

aggcagaaag gacgccggca ggctgatagt attgatggca aaggggacga tagacgaggc 3 5940 

atactattgg attggtcggc gcaagatgag cgccgccaag ggcatgggtg agaggatgaa 36000 

ccggtcgctg gcggcaggcg gggctgctgc caaggccgct ccaaagggac tcgaggggta 36060 

cttttagccg aggcgcttta tcacgtgata gccaaactct gactttatgg gttccgagat 36120 

ctcgcctatc tgcagccgga acgcggcctc ttcaaacggc tttaccatct ttcccctgcc 36180 

aaagtagccg agactgccgt ccctctttgc actgcccccg tccatggaga gctcctttgc 36240 

gagcctgcca aacttttccc cggccttgag gcgctctttt actgctagcg cctcgccctg 36300 

tttttttacc agtatgtgcg agcactttat cttgtctgcc atgtgcgctg ctcctttgta 36360 

ccctgctata acttgtcgtg ctgccggggc gcggtcatcc cgcagggcct gtattctgcc 36420 

aggaactgtt aatccgcagg gactggtttc cccgtattat cctgtcatat acagggggga 36480 

ttcggcggtc cacgtgtatt aacacctaaa gcagggataa acgtgtgaga acaagtgggc 36540 

acccggaacg aatgttacgg catactggag gataaccaga taaaggaact agaacaaggc 36600 

agcggcatcg acgtaccgtt gctggaccac gagggcagtc agttccattc caacagaata 36660 

catcttacac tgggcaatat cgtatctccg cttgagcttg gtgacggcag tatgacaaac 36720 

cccgggacgg acctgacacc atacgacgta aagtcaatag gcatggggcg caccataaag 36780 

cgatatgcaa agtaccgttc tgaaggatcg caggccgtcc gcatggatgt catattcatg 36840 

tcccgtgccg cctgggatga gatggataaa ggcaaggcat gaccggccgg ttttggggca 36900 

tactgccggt atacagcggg aacaggcatt cagagacttt ggcggattcc gtgtgacccg 36960 

ccccaagcta aacttttaat tgggatccgg cgagccggcg cgtgtcatcg tactttacca 37020 

taaagaccgc caacctggcc ctgcccgacg tggtcaaaaa gtacaaccac gtcctggcat 37080 

gcaagagcga ggtgatgagg gccgagaagc agatccagac gtccatctcc tcgtctagcg 37140 

ggctcgacaa gtactcggag ctcaagcaac agttcaactc ccggataacc gagttctacc 37200 

gctcgataga agagctggaa aagaccggtg cggtggtcaa gagcatagac gagggcctgc 37260 

tggactttcc cgcaaagcgc tttggggacg acatctggct gtgctggaag acaggcgagc 37320 

gcgagatcaa gttctggcat gaaaaggact ctggttttgg cggaagaaag cccatagagg 37380 

taagtgacga gtcactagtg tagatgctct ccgcctggtt gcgcgtaata cgcgtccgct 37440 

tcctgctcgc gtcggtgata gccgtctcgg cgggcctcgc cctctcctgg tggcacggcc 37500 

acgaaataga cgcattctcc gccgcgctca ccatggccgg cgtggccgcg ctccacgcaa 37560 
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gcgtggacat 
ggaccccgat 
tgcaccgcgc 
tcacaacggg 
actcgacgag 
tgatcgtcct 
tgggggcggc 
acgatgcgga 
cctcgcggat 
tcctgcagtt 
ttgcggcaaa 
gcacgctgcg 
gagtggagct 
attctactaa 
tatcatcatc 
cgctgtactt 
ggggccccgg 
gtgggtactc 
tgcagcattc 
tggtgggtga 
atgaatccaa 
ttacaaagat 
tcctgggtca 
cgtgctcaca 
tcatctgcaa 
cttgattggc 
acgcaataaa 
cggaggatct 
gcgtgcagtc 
atccggggaa 
gaccggcggg 
acacataaat 
ggtacacgca 
gggctgcacc 
gacagggccg 
acggcgataa 
ctctcagaag 
gtaacaaacg 

gggggcggcg 

tatctagagg 
gtctacgtgc 
gactgtatag 
tactatgggc 
cagctcgaag 
ttcatacagc 
gtggacggca 
acgagtgtcc 
gggctctccc 
cccctgccgg 
ccccccatcc 
ggtcagaaat 
agccgtgaat 
aggtctatcc 
cctgcaaacg 
acaaggtctt 
tcccagacgt 
tcgtggcccc 
cagtactttt 



gctcaacgat 
gagcggcgga 
cggcatcata 
gcccgtcata 
gattgtagac 
tggcgcctac 
cgtgggcgcc 
caagtcccgc 
cctctgggtg 
cctgccggtg 
aggccttgcc 
gtttagcagg 
aggttcgaga 
cacgccatcg 
cctgacgttc 
atccataatt 
caggcctttg 
tgaaaaatcc 
aaagtaaaac 
gatcatgcct 
tcccgggttg 
agggggaatg 
aaagaacaac 
cgaggacagc 
cgggtgcaga 
cccgccgtat 
accccggatg 
gtcggttgtg 
ctgcccgtgc 
actgcaggct 
tcgcaccccg 
cccgcctgaa 
ggatagccga 
tctccaacgt 
tcacggacgc 
actaccgcct 
aggggctcgt 
actccgaggt 
tgatccaggc 
accagctgaa 
acaaccccgt 
gagaggcctt 
tcgccacgtg 
acgttgtaaa 
tgcccttcaa 
gaaagctgtc 
cgttcatgca 
ccgccctgcg 
ggcacaactc 
cgcctgacaa 
agccggtcag 
cggcggaatc 
cggtaaagcc 
ggtccagcac 
ctgggaatac 
tccccgggtt 
ctatcctctt 
cggggtcata 



tattcggact 
acaggggtgc 
tcgctggtcc 
gccatgatac 
tcgggcctct 
tacatacagg 
ctctcgtcgg 
ggcagaaaga 
ttccccgcag 
catgcactaa 
agggagtacg 
gttgcaggcg 
cgatgtaagc 
tcttttacat 
tcgggagtta 
aggtaaaact 
ggatttttcg 
cgggggcttt 
atggacatga 
tatttccatt 
tgtgccctgt 
gtataccatg 
cccccaaaat 
gtggatacgg 
aaaaaggtaa 
acaagatgaa 
cctgccggaa 
ggatggatat 
ggcagggata 
gccccggaat 
ggcgggatac 
cggtcgtccg 
gatgtcgggc 
ggggatgggc 
agtcaagagg 
ccagagggcc 
atcaagggac 
ctcgcttgac 
aggcgacata 
gaggagcctt 
cgaggggcag 
tgccatgtac 
ggagtgcttt 
gaaggccaaa 
ccagtacttt 
catactggat 
aggcaagctg 
atccctgcag 
agctgcgcat 
gttcggggag 
ctgcctctcg 
ctgcacgttg 
cctcttgagg 
gtaatcgccc 
cgcaaagtgc 
cttgccccgc 
gcgcgtcgca 
gccgtgggcc 



acaagcgcgg 
tgccagaagg 
tgggctctgc 
tcggctttgc 
ccgaggtctt 
cgcccgagat 
cggtcctctt 
cgcttgttat 
tggcatactc 
ccatgctgct 
gcggggacgg 
ccctgctggt 
ttaaactagg 
caacgaatat 
tttcttttga 
tctacatgta 
ctgcgtcaag 
ccggcagggt 
acaagctctg 
acaagtgcca 
gcaaaaagga 
aagactgttg 
tcccgtatac 
ccgtcaagat 
aagccacggg 
atcatgcatt 
gagtccggtc 
catccacttg 
caaagccatg 
ctaaaggtgc 
caacgaacgg 
cgcatgatca 
gcccatatcg 
acctaccttg 
tccgtcaaaa 
gagcgctctg 
caaatattca 
ttttgggagt 
tcctccggat 
gcaaacatgg 
atcaaggacc 
gagaaggcaa 
cgtgttgcag 
gacgcaggcg 
gaccaggctt 
gcggcagtat 
ctcgagcctg 
tttatcaggt 
acagacgaga 
cttgtggcca 
ggcattatct 
cgccgggccc 
catgcagaga 
tcttttgtgg 
tcgtttccat 
gggttgcatg 
tgcctccgga 
catgatattt 



catagatacc 
cctgcttacc 
tgtcggcgcg 
cgtagtctcg 
tgtggccgtc 
aacgcctgcc 
tgtggcgtcg 
aatcctgggc 
gtccgttata 
tgcagccccc 
gatcatacgg 
gttgggcatt 
tatgcaattg 
agttcgtata 
ttccccgtct 
taaaaacgta 
tccgagtgtt 
acacaagaat 
cgggatatgc 
taccgcctgt 
tgtcacgggg 
tacaatattt 
gtacagtttt 
ggacaccggg 
taaaaggata 
ctaaatacca 
ataaaatccg 
ccattacatc 
atgcagaact 
aaatggctgc 
acccgcaccg 
gcgggcacgc 
acaactacaa 
gcgacgcgga 
caggcataaa 
tcggcagggc 
tatcgacaaa 
atgtgaaaaa 
accactgcat 
gcctcgactg 
gccccatacc 
gggaggatgg 
gggacaaccc 
gggacaacca 
acatgctaaa 
cccttggcgt 
gcctgctgcc 
ctacaccagg 
acctcaagat 
gcctcacctc 
ggtcgagcac 
ttgccacgtt 
ctattcccgt 
caaactttac 
ggtgcgcctt 
cggcaaatat 
acttgcgata 
ccccggtggt 



ataaccaaga 
cccggccagg 
tactttgtgg 
atatactttt 
aagggggcga 
gccgttctgg 
tttccagacc 
aaggagaggg 
acgggggtca 
cttgcagtaa 
gtcatgcgcg 
ctgttgggct 
gatcacttag 
acctgcgagt 
tgtagaacaa 
ctatcttcac 
ttcagggtgt 
agttaagtaa 
atgaggagtc 
gtaaaatggc 
attaaacatt 
gtggaaaaag 
acatgtccgt 
ccgcagagat 
agataggcat 
gggcttgaga 
gcccccgtac 
acgccaattc 
ggaggcaccc 
agaacccccg 
tacacttcat 
cacggccgag 
gatggtcgac 
tgacgccacc 
cgtcatagat 
cgtcacggag 
ggcgggctat 
agagtacgtc 
gaagcccgcc 
tatcgacctt 
ggagatcctc 
ccgcatcaga 
gcagaatgtc 
cggattcaag 
gaaccagacg 
cggtgtgttc 
ggagtttggc 
cgtgcttgcc 
catgggcgtg 
gtggtcgccc 
cttttttgag 
ggcaggatac 
ggtccccctg 
tatcctggat 
tgtggatatc 
cggatagtgc 
gcacgtgggg 
tggcagctcg 



37620 
37680 
37740 
37800 
37860 
37920 
37980 
38040 
38100 
38160 
38220 
38280 
38340 
38400 
38460 
38520 
38580 
38640 
38700 
38760 
38820 
38880 
38940 
39000 
39060 
39120 
39180 
39240 
39300 
39360 
39420 
39480 
39540 
39600 
39660 
39720 
39780 
39840 
39900 
39960 
40020 
40080 
40140 
40200 
40260 
40320 
40380 
40440 
40500 
40560 
40620 
40680 
40740 
40800 
40860 
40920 
40980 
41040 
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tcaaacggcg 
ggatccctct 
ctcagggcct 
tttgagagga 
atggcgtttc 
cttagcgcga 
actatgaaga 
agctcctcga 
tactttctgt 
gctgtattct 
ttttcactca 
attcatcttg 
ccgcagatgc 
cccgtttccc 
cgcggaccac 
ctacatcata 
gatggctaca 
cggggaggcc 
tgtctatgcg 
cagggccagg 
ccggatggcc 
agcagatcgt 
cccgcggtcc 
ctgccgtgat 



taccaggcgt 
tcccgggggc 
cgttgccctg 
ccagaacaaa 
gcttgtacca 
gccggtgcgg 
ggctcccgtc 
tgaactcgtc 
gcccgtaata 
ttgccagccg 
agataggccc 
taccgcggca 
ggtacaatcc 
cttccggggg 
atgtggataa 
ctgcccatct 
ttctggtggt 
ataggtcagc 
gcagtatggc 
gctggtaccc 
aaggccggcg 
tccatgccgc 
aattaaatac 
cc 



tgagccgtgt 
gaactgcagc 
gacgcgtatc 
ctcgtacgcc 
gattatatcc 
gaccatcagc 
gtctgttagc 
tggcgtcttt 
gggaggggat 
cggcagcacc 
ccgtgccatc 
cccgcgccgt 
ctccggtgct 
ccccgccacc 
aggacgagtt 
acgggtacct 
cggtggcgct 
tgatcggcgg 
tgggcatggc 
tgcgcgaatc 
catacatccc 
ccccgtacgc 
ggcaaggaac 



atcacggctg 
cggtcatttg 
gggtttatgt 
tgcgtcaggt 
tcctggaaat 
ttgtggcgcc 
aggtccatgc 
tcctggccca 
gttaccgcca 
tcccgggcgt 
catctgcccc 
cttaaaatct 
cccgcgatcc 
atacacgtgg 
tcttggcaaa 
cttcctggag 
cagcccccct 
gcatgtattg 
acacgggata 
ccccgcatag 
atgatgcata 
tctggggcgc 

ggggggtctc 



caatcctccc 
cgggtttgct 
cataggcggg 
tttgccgcga 
ggtacccaag 
gcctcctggt 
agctcttgaa 
gctcggaggg 
gtctgtacct 
cgccctgcag 
tgcgcgatcc 
ttgtagctta 
ggcgcggtgc 
tataaacaga 
ggcaacaaga 
tactggccgt 
attctcctga 
tttggagttg 
atcctcctga 
ccccggcagg 
gaccgggggg 
acctagtcag 
gttgaaactg 



tattgcctcg 
gtttatcccg 
ggtatccgac 
gctttgcgag 
atccaccagc 
atcacctatc 
tactcctgcc 
ctccgacccg 
gccgcgctcg 
tatctggaac 
gacaagtcgt 
taccggcgcg 
catcagccgc 
ggccggacgg 
tgaggctgct 
tcctgccctg 
tgccttatgc 
tcacaaagta 
cagggcgcct 
gcccgttgtt 
acatgatcgc 
ggcggggccc 
cagggcagga 



41100 
41160 
41220 
412B0 
41340 
41400 
41460 
41520 
41580 
41640 
41700 
41760 
41820 
41880 
41940 
42000 
42060 
42120 
42180 
42240 
42300 
42360 
42420 
42432 



<210> 3 
<211> 10419 
<212> DNA 

<213> Cenarchaeum symbiosum 



<220> 
<221> CDS 
<222> (1). 



(10419) 



<400> 3 

ate ccc gcg ccg cca gga gag ggc age ctt ggc ggg gtg gca ata tec 

Met Pro Ala Pro Pro Gly Glu Gly Ser Leu Gly Gly Val Ala lie Ser 

15 10 15 



48 



gac gac ggg agg tac atg tac gca ate ggc agg gat ctg etc aca gta 
Asp Asp Gly Arg Tyr Met Tyr Ala lie Gly Arg Asp Leu Leu Thr Val 
20 25 30 



96 



tac egg tat aca atg aac ccg ccc cat gac ata gec teg gee gcg etc 
Tyr Arg Tyr Thr Met Asn Pro Pro His Asp lie Ala Ser Ala Ala Leu 
35 40 45 



144 



ggt gcg cag tea ttt tct ctg cct ggc ggc ate age ccc gee ccc ggc 
Gly Ala Gin Ser Phe Ser Leu Pro Gly Gly He Ser Pro Ala Pro Gly 
50 55 60 



192 



gcg ccg ace ggc ctt gac ate teg gat gac ggc cgc cac ctg tac gtc 
Ala Pro Thr Gly Leu Asp lie Ser Asp Asp Gly Arg His Leu Tyr Val 
65 70 75 80 



240 



ccg gac gaa aac ggc gtc gtg tac agg ttt gat ctg gaa age ccg tac 
Pro Asp Glu Asn Gly Val Val Tyr Arg Phe Asp Leu Glu Ser Pro Tyr 
85 90 95 



288 
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agg eta gac ggc ggc acg ttt ggc tct tct gtt tat gtg gga tec gac 336 
Arg Leu Asp Gly Gly Thr Phe Gly Ser Ser Val Tyr Val Gly Ser Asp 
100 105 110 

gtt gec gcg ccc cgc ggc gta tac gtg gcg ccg ggc ggc age etc atg 384 
Val Ala Ala Pro Arg Gly Val Tyr Val Ala Pro Gly Gly Ser Leu Met 
115 120 125 

ctg gtc teg gat agt gca gac ggc acc ate cac agg tac gag ctg gca 432 
Leu Val Ser Asp Ser Ala Asp Gly Thr lie His Arg Tyr Glu Leu Ala 
130 135 140 

age ccg tac gag ccg gcg ggc gcg gca aac agg gga tea ttc gac gtg 48 0 

Ser Pro Tyr Glu Pro Ala Gly Ala Ala Asn Arg Gly Ser Phe Asp Val 
145 150 155 160 

teg gat atg gac ggc teg cct gtc ggg gcg ggg ttt gcg ggc ggc ctg 52 8 

Ser Asp Met Asp Gly Ser Pro Val Gly Ala Gly Phe Ala Gly Gly Leu 
165 170 175 

cac atg tat gtc gcg gga aac gac acc gga agg gtc tac cag tat ccg 576 
His Met Tyr Val Ala Gly Asn Asp Thr Gly Arg Val Tyr Gin Tyr Pro 
180 185 190 

gcg ggc acg cac cag ata cag gag gca gec gca ggg ccg egg ctg etc 624 
Ala Gly Thr His Gin lie Gin Glu Ala Ala Ala Gly Pro Arg Leu Leu 
195 200 205 

teg gee gtc ctg gac aaa gac gga acc ctg agg gcg gec ttt gac ggc 672 
Ser Ala Val Leu Asp Lys Asp Gly Thr Leu Arg Ala Ala Phe Asp Gly 
210 215 220 

acg gta gac gcg gga tec gtg cag ccc ggg atg ate acc ate agg gac 720 
Thr Val Asp Ala Gly Ser Val Gin Pro- Gly Met He Thr He Arg Asp 
225 230 235 240 

ggc cat ggc tec aac acg gga ata ccc ctt ttg ctt gee ggg ggt gec 768 
Gly His Gly Ser Asn Thr Gly He Pro Leu Leu Leu Ala Gly Gly Ala 
245 250 255 

gcg gac tct gat gtc atg aca ttt gtg gtc ccc gag aaa gac agg gca 816 
Ala Asp Ser Asp Val Met Thr Phe Val Val Pro Glu Lys Asp Arg Ala 
260 265 270 

gag get gec gca tac ggg gac cag teg ctg cat gtt ccc gec gcg gcg 864 
Glu Ala Ala Ala Tyr Gly Asp Gin Ser Leu His Val Pro Ala Ala Ala 
275 280 285 

ctg gcg ggg act ggc ggc ggg ccg ttt gtg ccc gac ttt tec ggg ggc 912 
Leu Ala Gly Thr Gly Gly Gly Pro Phe Val Pro Asp Phe Ser Gly Gly 
290 295 300 

teg ctg ctg gcg tec ctg tac egg cac gag egg ccg ttc cag ggc gag 960 
Ser Leu Leu Ala Ser Leu Tyr Arg His Glu Arg Pro Phe Gin Gly Glu 
305 310 315 320 



gag atg gca egg acg gag aga tec gac agg tac gcg ctt act gta act 1008 
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Glu Met Ala Arg Thr Glu Arg Ser Asp Arg Tyr Ala Leu Thr Val Thr 
325 330 335 

gca ggc ggg agt cag atg cat gtg ggc ggc gcc ggc gga aac ate acc 1056 
Ala Gly Gly Ser Gin Met His Val Gly Gly Ala Gly Gly Asn lie Thr 
340 345 350 

tgg tac gat ctt ggc acg ccc cat gac ata acg acc ggc gtc cgc gcg 1104 
Trp Tyr Asp Leu Gly Thr Pro His Asp lie Thr Thr Gly Val Arg Ala 
355 360 365 

gga tec gac ate ctg ccg gcg tat cca tec gcg ggc aga aac gtg gtg 1152 
Gly Ser Asp lie Leu Pro Ala Tyr Pro Ser Ala Gly Arg Asn Val Val 
370 375 380 

ccg tea ata acg ggc att gcc ttc teg gat gac ggc atg egg ttg ttt 1200 
Pro Ser He Thr Gly He Ala Phe Ser Asp Asp Gly Met Arg Leu Phe 
385 390 395 400 

gca gca aac egg ggc gac cgc att cca atg tac cag ctg gac age ccg 1248 
Ala Ala Asn Arg Gly Asp Arg He Pro Met Tyr Gin Leu Asp Ser Pro 
405 410 415 

tac gac ata ggg age gcc age etc gag gga acc ctg ttt acg ggg ttc 1296 
Tyr Asp He Gly Ser Ala Ser Leu Glu Gly Thr Leu Phe Thr Gly Phe 
420 425 430 

cag teg ggc att gca ttc teg gat gac ggc acg cgc atg ttt gcc gcc 1344 
Gin Ser Gly He Ala Phe Ser Asp Asp Gly Thr Arg Met Phe Ala Ala 
435 440 445 

ctg etc acc gag aat gcc ata egg cag tac gac ctg gag ggc ccc tat 1392 
Leu Leu Thr Glu Asn Ala He Arg Gin Tyr Asp Leu Glu Gly Pro Tyr 
450 455 460 

gac ata cgc ggg gcg ggc aat gcg ggc cag tac gac ctg gac ate ccg 1440 
Asp He Arg Gly Ala Gly Asn Ala Gly Gin Tyr Asp Leu Asp He Pro 
465 470 475 480 

ctg cac cca gga ctg ctg ttc ctg ctg acc teg ggg gtg cac ttt teg 1488 
Leu His Pro Gly Leu Leu Phe Leu Leu Thr Ser Gly Val His Phe Ser 
485 490 495 

ccc gac ggg acg agg atg ttc gtc ggc gag ggg ata tea gat gcg gag 1536 
Pro Asp Gly Thr Arg Met Phe Val Gly Glu Gly He Ser Asp Ala Glu 
500 505 510 

gat gcc aac gcg aac agg gat gtc aac gtc aac ctg tgg cac agg ttt 1584 
Asp Ala Asn Ala Asn Arg Asp Val Asn Val Asn Leu Trp His Arg Phe 
515 520 525 

gat etc tec acg ccg ttt gat gtg etc acg gcg gag cgc gtg gac acg 1632 
Asp Leu Ser Thr Pro Phe Asp Val Leu Thr Ala Glu Arg Val Asp Thr 
530 535 540 

tac gag tac age acg ggg ccg gca ggc gat etc gag gac etc tec ctg 16 80 

Tyr Glu Tyr Ser Thr Gly Pro Ala Gly Asp Leu Glu Asp Leu Ser Leu 
545 550 555 560 
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tec cct gac ggc cgc aga ttg tac acc ctg teg age gag agg gta age 1728 
Ser Pro Asp Gly Arg Arg Leu Tyr Thr Leu Ser Ser Glu Arg Val Ser 
565 570 575 

tea age gag tat aca ate acc egg gee cag tac tgg ctg cca gaa ccg 1776 
Ser Ser Glu Tyr Thr lie Thr Arg Ala Gin Tyr Trp Leu Pro Glu Pro 
580 585 590 

tac gac gtg acg ccg ccg tac cat gtg ccg tea ttc aac gca age cag 1824 
Tyr Asp Val Thr Pro Pro Tyr His Val Pro Ser Phe Asn Ala Ser Gin 
595 600 605 

999 99C aac ctg gca gac ccc tac ggg atg gee ttc teg ccc gac ggg 1872 
Gly Gly Asn Leu Ala Asp Pro Tyr Gly Met Ala Phe Ser Pro Asp Gly 
610 615 620 

acc agg ctg ctg gtc acg ggg cac ggg cag acg aat gca aag ctg ttc 1920 
Thr Arg Leu Leu Val Thr Gly His Gly Gin Thr Asn Ala Lys Leu Phe 
625 630 635 640 

cac ctg aat ccg ccc ttt gat gtg ggc acg gec gtg ttc cac gac cac 1968 
His Leu Asn Pro Pro Phe Asp Val Gly Thr Ala Val Phe His Asp His 
645 650 655 

ggc agg ttc cgc ccc ggg ggg ccc gca age gag ate gag gcg teg ggg 2016 
Gly Arg Phe Arg Pro Gly Gly Pro Ala Ser Glu lie Glu Ala Ser Gly 
660 665 670 

ata tec ctg tct gee gac ggc tec agg atg ttt etc tec gac cgc ggc 2064 
lie Ser Leu Ser Ala Asp Gly Ser Arg Met Phe Leu Ser Asp Arg Gly 
675 680 685 

cgc ggg gee ate age cag tac acg ctg gtt gcg ccc ttt gat gtg gag 2112 
Arg Gly Ala He Ser Gin Tyr Thr Leu Val Ala Pro Phe Asp Val Glu 
690 695 700 

ttt gcg teg gat gtg tec gcg gat ggg cag etc gac gtt ggc gee cag 2160 
Phe Ala Ser Asp Val Ser Ala Asp Gly Gin Leu Asp Val Gly Ala Gin 
705 710 715 720 

gat gcg ctt ccc ggc ggg ctt gee ttc teg ccc ggg ggg acg agg eta 2208 
Asp Ala Leu Pro Gly Gly Leu Ala Phe Ser Pro Gly Gly Thr Arg Leu 
725 730 735 

ttc atg gtg gga ggc atg gac agg tea gtt cac atg tat tec ctg aat 2256 
Phe Met Val Gly Gly Met Asp Arg Ser Val His Met Tyr Ser Leu Asn 
740 745 750 

acg ccg ttt gac ctg ggc ggg gca gag cat gcg gcg teg ttt ggc gtg 23 04 

Thr Pro Phe Asp Leu Gly Gly Ala Glu His Ala Ala Ser Phe Gly Val 
755 760 765 

ggg gac agg gtc teg gat ccc etc ggc ate gee ttt ggg aac ggg ggg 23 52 

Gly Asp Arg Val Ser Asp Pro Leu Gly He Ala Phe Gly Asn Gly Gly 
770 775 780 



act aaa atg eta ata gee gat acg aca ggc ttt gtg cac ggg tac gac 



2400 
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Thr Lys Met Leu lie Ala Asp Thr Thr Gly Phe Val His Gly Tyr Asp 
7B5 790 795 800 

ctt ggc gcc ccg tac gat ate teg ggc ccc gcg tac age ggc ata ttt 2448 
Leu Gly Ala Pro Tyr Asp lie Ser Gly Pro Ala Tyr Ser Gly lie Phe 
805 810 815 

gac gcc ggc ggc age ate egg gac gtg gcc gtc ggc ggg ggg tec atg 2496 
Asp Ala Gly Gly Ser lie Arg Asp Val Ala Val Gly Gly Gly Ser Met 
820 825 830 

ttc ata etc gag ggg gag acg gac egg gtg tat gag cac cgc ccc ggc 2544 
Phe lie Leu Glu Gly Glu Thr Asp Arg Val Tyr Glu His Arg Pro Gly 
835 840 845 

ata tac ccg gtg gtc tea gca ctg gac ggg ccg gcg ctg gtc tct get 2592 
He Tyr Pro Val Val Ser Ala Leu Asp Gly Pro Ala Leu Val Ser Ala 
850 855 860 

gca gca gat gca agg gtg ggt gcg gcc gag gtg etc ttt gat cgc gcg 264 0 

Ala Ala Asp Ala Arg Val Gly Ala Ala Glu Val Leu Phe Asp Arg Ala 
865 870 875 880 

gtg gat gtt ggc ggg ata gac ccc ggg ggg gtc cgc ata gtg gat gca 2688 
Val Asp Val Gly Gly He Asp Pro Gly Gly Val Arg He Val Asp Ala 
885 890 895 

gca ggc ccc ctg ccc ggc gtg gtg ate teg gat gcc gtc ata cca ggc 2736 
Ala Gly Pro Leu Pro Gly Val Val He Ser Asp Ala Val He Pro Gly 
900 905 910 

gag gat ccc ggc gtg gcc agg ttc age ctg teg gac gcg gag gtc ctt 2 784 

Glu Asp Pro Gly Val Ala Arg Phe Ser Leu Ser Asp Ala Glu Val Leu 
915 920 925 

gcc gtg tec ggg tat gcc gag ccg agt ctg gtc ttt gga agg cat gcg 2832 
Ala Val Ser Gly Tyr Ala Glu Pro Ser Leu Val Phe Gly Arg His Ala 
930 935 940 

gtg ccg ggc gcg gca ggc ggc aca ttt ccc tec cag ata ggc aac gcc 2880 
Val Pro Gly Ala Ala Gly Gly Thr Phe Pro Ser Gin He Gly Asn Ala 
945 950 955 960 

acg gag ctt gtg gga teg att ccg aat ccg acc ctg gat ttt ggg acg 2928 
Thr Glu Leu Val Gly Ser He Pro Asn Pro Thr Leu Asp Phe Gly Thr 
965 970 975 

acc ctg acg ggg gcg gca ttc teg gcg gac ggg acg gtg gta ttt etc 2976 
Thr Leu Thr Gly Ala Ala Phe Ser Ala Asp Gly Thr Val Val Phe Leu 
980 985 990 

tea gac ggc ccc acc ggc agg gtg tac ccg tat tea ctg aat ate ccc 3024 
Ser Asp Gly Pro Thr Gly Arg Val Tyr Pro Tyr Ser Leu Asn He Pro 
995 1000 1005 



ttt gac ata teg tct gcg gcg cct ggg ggc ttt gta ate gtg ccc gtc 
Phe Asp He Ser Ser Ala Ala Pro Gly Gly Phe Val He Val Pro Val 
1010 1015 1020 
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gga gtc teg gac att gcg ttt tct gec gac ggg egg aac atg eta gtc 3120 
Gly Val Ser Asp lie Ala Phe Ser Ala Asp Gly Arg Asn Met Leu Val 
1025 1030 1035 1040 

gcg gac gaa ace ggg gga ata cac agg tac ctg gee cgc age ccg tac 3168 
Ala Asp Glu Thr Gly Gly lie His Arg Tyr Leu Ala Arg Ser Pro Tyr 
1045 1050 1055 

gag ata ggc acg gat ttc ate aaa tea tec ctg ggt gag ttt gtc gag 32X6 
Glu lie Gly Thr Asp Phe lie Lys Ser Ser Leu Gly Glu Phe Val Glu 
1060 1065 1070 

aca ttc teg gcg gcg ccc cgc gtg cag gat ctt gee ggc ate gee ttt 3264 
Thr Phe Ser Ala Ala Pro Arg Val Gin Asp Leu Ala Gly He Ala Phe 
1075 1080 1085 

teg cac gac ggc atg ate atg ctt gcg gec ggc ggc teg ggg tct gtg 3312 
Ser His Asp Gly Met He Met Leu Ala Ala Gly Gly Ser Gly Ser Val 
1090 1095 1100 

cac egg tac teg ctg cca tec ccg tat gca gta teg ggg gee aaa tac 3360 
His Arg Tyr Ser Leu Pro Ser Pro Tyr Ala Val Ser Gly Ala Lys Tyr 
1105 1110 1115 1120 

gag gag acg gcg atg att ggc ggg age ccg teg ggg ctg gag ttc teg 3408 
Glu Glu Thr Ala Met He Gly Gly Ser Pro Ser Gly Leu Glu Phe Ser 
1125 1130 1135 

tec gac ggc ctg agg atg ttt gtt ccc gat gcg ggc teg gag acg gcg 3456 
Ser Asp Gly Leu Arg Met Phe Val Pro Asp Ala Gly Ser Glu Thr Ala 
1140 1145 1150 

gca gtc tac ggc ctt gee gec ccc tac ggg att ggc gag gcg gag ccg 3504 
Ala Val Tyr Gly Leu Ala Ala Pro Tyr Gly He Gly Glu Ala Glu Pro 
1155 1160 1165 

ctg ccg ccg ctg ttc ctg ggg gta ggg gca gaa gag gec acg etc teg 3552 
Leu Pro Pro Leu Phe Leu Gly Val Gly Ala Glu Glu Ala Thr Leu Ser 
1170 1175 1180 

cct gac ggc agg cac ate eta gtt ccc ggc agg ccc ggc ctg tec cag 3600 
Pro Asp Gly Arg His He Leu Val Pro Gly Arg Pro Gly Leu Ser Gin 
1185 1190 1195 1200 

tac teg ctg ttc teg acg aat ctt gag ctg tgc gcg gag ccc egg ggc 3648 
Tyr Ser Leu Phe Ser Thr Asn Leu Glu Leu Cys Ala Glu Pro Arg Gly 
1205 1210 1215 

att gac ggg gga teg tgc gaa gat ggg ata tac gee ttt gag agt ccg 3696 
He Asp Gly Gly Ser Cys Glu Asp Gly He Tyr Ala Phe Glu Ser Pro 
1220 1225 1230 

ggc agg ggc gag ggc gta teg ctt gec gec teg ata acg gcg gca gac 3744 
Gly Arg Gly Glu Gly Val Ser Leu Ala Ala Ser He Thr Ala Ala Asp 
1235 1240 1245 



ggg cca gga att ggc gag ctg cac ggg ttt gca ggc ccg ccg atg ccg 
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Gly Pro Gly He Gly Glu Leu His Gly Phe Ala Gly Pro Pro Met Pro 
1250 1255 1260 

gcg cct gtc atg gag cag gtc aca ctg gat teg egg gag ggc aca etc 3840 
Ala Pro Val Met Glu Gin Val Thr Leu Asp Ser Arg Glu Gly Thr Leu 
1265 1270 1275 1280 

a 99 gt c a 99 ct S 9 ac a 99 aca 9^9 9 ac 9tc gac acg gtg cgc ccc tat 3888 
Arg Val Arg Leu Asp Arg Thr Val Asp Val Asp Thr Val Arg Pro Tyr 
1285 1290 1295 

aag atg tgg gtg gag gat tea gac ggc age cag aca acc ctg gca aat 3936 
Lys Met Trp Val Glu Asp Ser Asp Gly Ser Gin Thr Thr Leu Ala Asn 
1300 1305 1310 

tea aca ctg ttg aat gec gaa aac teg aac att ctg etc ttc agg ctg 3984 
Ser Thr Leu Leu Asn Ala Glu Asn Ser Asn He Leu Leu Phe Arg Leu 
1315 1320 1325 

gat gat gcg gec gca ggc aaa ata tec ggg tat aca tec ccc gtg ttt 4032 
Asp Asp Ala Ala Ala Gly Lys He Ser Gly Tyr Thr Ser Pro Val Phe 
1330 1335 1340 

cgc acg tgg teg teg ccg ttc ctg ggc aca gac gga gec acc agg ccc 4080 
Arg Thr Trp Ser Ser Pro Phe Leu Gly Thr Asp Gly Ala Thr Arg Pro 
1345 1350 1355 1360 

cat acg ctg ggc ttt gga gac gtg cgc ctt gcg gat ata tac gat gca 4128 
His Thr Leu Gly Phe Gly Asp Val Arg Leu Ala Asp He Tyr Asp Ala 
1365 1370 1375 

tec ggg gat gtc ccg teg ccg teg ggc att gag ttt tea gat gac ggc 4176 
Ser Gly Asp Val Pro Ser Pro Ser Gly He Glu Phe Ser Asp Asp Gly 
1380 1385 1390 

atg agg atg ttc gtt acg ggg ate ggc acg cca ggc ate aac ata ttc 4224 
Met Arg Met Phe Val Thr Gly He Gly Thr Pro Gly He Asn He Phe 
1395 1400 1405 

aca ctg tec gee ccc ttt gac ata aca ttg ccg aag cat tec ggc tea 4272 
Thr Leu Ser Ala Pro Phe Asp He Thr Leu Pro Lys His Ser Gly Ser 
1410 1415 1420 

acc aac ata ggc ggc ctg tec gtg tct gat ctg gca ttt gca aac aat 4320 
Thr Asn He Gly Gly Leu Ser Val Ser Asp Leu Ala Phe Ala Asn Asn 
1425 1430 1435 1440 

ggg aac age etc acg gtg etc gat gtg gac ggg gtg ttg cgc gtc tac 4 368 

Gly Asn Ser Leu Thr Val Leu Asp Val Asp Gly Val Leu Arg Val Tyr 
1445 1450 1455 

gee ctt ggg gac gat tac aat gtg gtc acc gga acc acc cag aag ttt 4416 
Ala Leu Gly Asp Asp Tyr Asn Val Val Thr Gly Thr Thr Gin Lys Phe 
1460 1465 1470 



agg att acg etc gat acc aca cag ggc ata ccc aat tec att tac aca 
Arg He Thr Leu Asp Thr Thr Gin Gly He Pro Asn Ser He Tyr Thr 
1475 1480 1485 
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tct ccg gac ggc ctg tea cag ttt gtg gca tat gat gac agg att gac 4512 
Ser Pro Asp Gly Leu Ser Gin Phe Val Ala Tyr Asp Asp Arg lie Asp 
1490 1495 1500 

ttg tac gtg ctt ggc age cca aac gac ata teg teg aca ace gag ata 4560 
Leu Tyr Val Leu Gly Ser Pro Asn Asp lie Ser Ser Thr Thr Glu He 
1505 1510 1515 1520 

ate ccg tat teg ctg cca agg ccg gac ccg cca acc ggc atg gac ttt 4608 
He Pro Tyr Ser Leu Pro Arg Pro Asp Pro Pro Thr Gly Met Asp Phe 
1525 1530 1535 

acg cca gac ggg cgc agg atg ttc ctg tec acc gag aac ggg ata gac 4656 
Thr Pro Asp Gly Arg Arg Met Phe Leu Ser Thr Glu Asn Gly He Asp 
1540 1545 1550 

cag tac ctg ctt tea gaa ccg ttt gca gtc acc acg teg gta ttt ttg 4704 
Gin Tyr Leu Leu Ser Glu Pro Phe Ala Val Thr Thr Ser Val Phe Leu 
1555 1560 1565 

cgc acg ate ccc att gac gga ggg gcg gag gga ata egg ttt gta gac 4752 
Arg Thr He Pro He Asp Gly Gly Ala Glu Gly He Arg Phe Val Asp 
1570 1575 1580 

aac gga agg ggc ctg ttt gtg ccg ggc gec gac ggc ate ate cag agg 4800 
Asn Gly Arg Gly Leu Phe Val Pro Gly Ala Asp Gly He He Gin Arg 
1585 1590 1595 1600 

cac gag etc ate tac ccg tac ggg gee age acg teg ttg ttg gag acc 4848 
His Glu Leu He Tyr Pro Tyr Gly Ala Ser Thr Ser Leu Leu Glu Thr 
1605 1610 1615 

gtc agg gac ggc gtg acg gac ggc ggt ccg ggc gag aac ccg gee gee 4896 
Val Arg Asp Gly Val Thr Asp Gly Gly Pro Gly Glu Asn Pro Ala Ala 
1620 1625 1630 

gga gag ate cgc ctt gcg ggc aca ttc aat gca tec gat aat gta cag 4944 
Gly Glu He Arg Leu Ala Gly Thr Phe Asn Ala Ser Asp Asn Val Gin 
1635 1640 1645 

teg ccg teg ggc att gag ttt tea ggc gac ggc acg ggg atg ttt gtt 4992 
Ser Pro Ser Gly He Glu Phe Ser Gly Asp Gly Thr Gly Met Phe Val 
1650 1655 1660 

acc ggg ttt ggg gec gcg ggc gtg aat gaa ttc tec ctg tec gec ccc 5040 
Thr Gly Phe Gly Ala Ala Gly Val Asn Glu Phe Ser Leu Ser Ala Pro 
1665 1670 1675 1680 

ttt gat aca acc etc ccg gtg cat gtg gaa ttg cac gat ata ggc ggc 5088 
Phe Asp Thr Thr Leu Pro Val His Val Glu Leu His Asp He Gly Gly 
1685 1690 1695 

cag ccg gca gtt gat ctg gcg ttt gca gaa gat ggc agg acc etc ctg 5136 
Gin Pro Ala Val Asp Leu Ala Phe Ala Glu Asp Gly Arg Thr Leu Leu 
1700 1705 1710 



ttg ctg gee gcg gat gga aca ctg gat ttc tac age ctt gee ggt gat 
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Leu Leu Ala Ala Asp Gly Thr Leu Asp Phe Tyr Ser Leu Ala Gly Asp 
1715 1720 1725 

gcc tat gat ata ggg gaa gca tec cgt act ttt caa gtg ccg ttt gag 5232 
Ala Tyr Asp He Gly Glu Ala Ser Arg Thr Phe Gin Val Pro Phe Glu 
1730 1735 1740 

gat gcc gcg ggt get gtg ccc ggc gcc ttt tac cag cct ccg gat ggc 5280 
Asp Ala Ala Gly Ala Val Pro Gly Ala Phe Tyr Gin Pro Pro Asp Gly 
1745 1750 1755 1760 

teg tct att att gcc gca ttt gac ggc agg att gac cag tat gtg gtg 5328 
Ser Ser He He Ala Ala Phe Asp Gly Arg He Asp Gin Tyr Val Val 
1765 1770 1775 

ate ccc ttc gag ttc gtg tea tat cca ctg aca agg ccc ggc acg ccc 5376 
He Pro Phe Glu Phe Val Ser Tyr Pro Leu Thr Arg Pro Gly Thr Pro 
1780 1785 1790 

aca ggg att gac ttt gcg cca gac ggg cgc tgg atg ttc ctg tec ace 5424 
Thr Gly lie Asp Phe Ala Pro Asp Gly Arg Trp Met Phe Leu Ser Thr 
1795 1800 1805 

gag aac ggg ata gac cag tac ctg ctg teg ate ccc ttt gac gtg cgc 5472 
Glu Asn Gly He Asp Gin Tyr Leu Leu Ser He Pro Phe Asp Val Arg 
1810 1815 1820 

age ctg acg tat acg gga ace att cca gta gac ggg gtg gag gga atg 5520 
Ser Leu Thr Tyr Thr Gly Thr He Pro Val Asp Gly Val Glu Gly Met 
1825 1830 1835 1840 

cag ttt gcg gac aac ggc agg gca ctg ttt ttg gcg gac agt gaa ggc 5568 
Gin Phe Ala Asp Asn Gly Arg Ala Leu Phe Leu Ala Asp Ser Glu Gly 
1845 1850 1855 

ttg att tac aat tat gac ctg gag gac ccg tat get ctg gat ggc aac 5616 
Leu He Tyr Asn Tyr Asp Leu Glu Asp Pro Tyr Ala Leu Asp Gly Asn 
1860 1865 1870 

aca att tec gtg gaa ttc teg ttt gac ggt age gtg atg tat gtg ctg 5664 
Thr He Ser Val Glu Phe Ser Phe Asp Gly Ser Val Met Tyr Val Leu 
1875 1880 1885 

gag tac gac aca aaa agg gtg gtc teg tac gag ttg gag ttt ccc ttt 5712 
Glu Tyr Asp Thr Lys Arg Val Val Ser Tyr Glu Leu Glu Phe Pro Phe 
1890 1895 1900 



gac gta teg age aga aca cgt gca gac acg ctg gac ata cca caa att 
Asp Val Ser Ser Arg Thr Arg Ala Asp Thr Leu Asp He Pro Gin He 
1905 1910 1915 1920 



5760 



gac tea cca aga cac gtt gca gtc teg atg ccc ggc aac cac ctg tac 5808 
Asp Ser Pro Arg His Val Ala Val Ser Met Pro Gly Asn His Leu Tyr 
1925 1930 1935 

ata aca aac teg gtg ttt ggg gaa gat gac ace ata cac tec tat gga 5856 
He Thr Asn Ser Val Phe Gly Glu Asp Asp Thr He His Ser Tyr Gly 
1940 1945 1950 
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ata tct aac aat gac ata teg teg gca tea tac ate ggc gag gaa ggc 5904 
lie Ser Asn Asn Asp He Ser Ser Ala Ser Tyr He Gly Glu Glu Gly 
1955 1960 1965 

ate ccg gaa ccc gtg ata aac ggg att gac ttt tec aac aac ggc cgc 5952 
He Pro Glu Pro Val He Asn Gly He Asp Phe Ser Asn Asn Gly Arg 
1970 1975 1980 

cgc atg ttt ctg att ggg ggc aac ggg ttc gac tac cag gtg ata cat 6000 
Arg Met Phe Leu He Gly Gly Asn Gly Phe Asp Tyr Gin Val He His 
1985 1990 1995 2000 

gac tac atg eta ggc aca aga tac gac ata tec age agg age ctg ctt 6048 
Asp Tyr Met Leu Gly Thr Arg Tyr Asp He Ser Ser Arg Ser Leu Leu 
2005 2010 2015 

gat aca tat gee att cca ggg ccg gtt gtt ttt ccc gcg ggc ctt gat 6096 
Asp Thr Tyr Ala He Pro Gly Pro Val Val Phe Pro Ala Gly Leu Asp 
2020 2025 2030 

ttc teg ttt gac agg ctg tec atg ttt ata ata age acc gec ggt teg 6144 
Phe Ser Phe Asp Arg Leu Ser Met Phe He He Ser Thr Ala Gly Ser 
2035 2040 2045 

gta tac agg tac ggc ctg gac gat ccg ttc ata gtt gaa aca atg gac 6192 
Val Tyr Arg Tyr Gly Leu Asp Asp Pro Phe He Val Glu Thr Met Asp 
2050 2055 2060 

tat cag gag tct ttc egg ctg ccc gta cca tea gcg get gat aat tea 6240 
Tyr Gin Glu Ser Phe Arg Leu Pro Val Pro Ser Ala Ala Asp Asn Ser 
2065 2070 2075. 2080 

ata teg gat ctg gca ttc ggc age age ggc ctg aat gec gta ata teg 6288 
He Ser Asp Leu Ala Phe Gly Ser Ser Gly Leu Asn Ala Val He Ser 
2085 2090 2095 

cac gag ggg etc gac acc ctg tac age ttt gta ctg gac ate ccg tat 6336 
His Glu Gly Leu Asp Thr Leu Tyr Ser Phe Val Leu Asp He Pro Tyr 
2100 2105 2110 

ggg gee gaa ttg gat att gac agg ctt gag ctt ccg ctg gtg ggg gtt 6384 
Gly Ala Glu Leu Asp He Asp Arg Leu Glu Leu Pro Leu Val Gly Val 
2115 2120 2125 

ccg acg gga ttc gag ttc teg gac aac ggg cgc cag ttg tac att ggc 64 32 

Pro Thr Gly Phe Glu Phe Ser Asp Asn Gly Arg Gin Leu Tyr He Gly 
2130 2135 2140 

gcg ttt cgt gac tct caa tec teg cca ggc acc ctg cct gcg ggc ctg 6480 
Ala Phe Arg Asp Ser Gin Ser Ser Pro Gly Thr Leu Pro Ala Gly Leu 
2145 2150 2155 2160 

cag cgc tat gag ctt ggc ata cca tat gac ctg get teg get gta ttt 6528 
Gin Arg Tyr Glu Leu Gly He Pro Tyr Asp Leu Ala Ser Ala Val Phe 
2165 2170 2175 



gcg cag tec ctg gga ata ttc gat ttt cct ccc ttc aac ggc atg egg 
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Ala Gin Ser Leu Gly lie Phe Asp Phe Pro Pro Phe Asn Gly Met Arg 
2180 2185 2190 

gcc aat ggc age ttg gca gga tta cat gtg ccg ccc gat gga age ate 6624 
Ala Asn Gly Ser Leu Ala Gly Leu His Val Pro Pro Asp Gly Ser lie 
2195 2200 2205 

ctg ttc agg gcc gga aat gcc gaa aga acc gta ate age tat gac atg 6672 
Leu Phe Arg Ala Gly Asn Ala Glu Arg Thr Val lie Ser Tyr Asp Met 
2210 2215 2220 

gac age cat gat ttg gat aca tta tea ttc agg gaa tea ttc aaa cca 6720 
Asp Ser His Asp Leu Asp Thr Leu Ser Phe Arg Glu Ser Phe Lys Pro 
2225 2230 2235 2240 

gat gtc gga cag teg aca ccc aac ata agg gac atg gac ata tec ccg 6768 
Asp Val Gly Gin Ser Thr Pro Asn lie Arg Asp Met Asp lie Ser Pro 
2245 2250 2255 

gac ggc atg ttc etc tac ctg ctt caa ggc gat gtt ctg gac atg tac 6816 
Asp Gly Met Phe Leu Tyr Leu Leu Gin Gly Asp Val Leu Asp Met Tyr 
2260 2265 2270 

aac ctt aca gat agt tat teg ctt gat gcc ccg gca tat gcg ggt acc 6864 
Asn Leu Thr Asp Ser Tyr Ser Leu Asp Ala Pro Ala Tyr Ala Gly Thr 
2275 2280 2285 

ctg gat ttg gaa ccg gag gat gta ata ccc agg ggg att tea ttc tea 6912 
Leu Asp Leu Glu Pro Glu Asp Val lie Pro Arg Gly lie Ser Phe Ser 
2290 2295 2300 

egg gat ggc acg agt ctg ttt atg aca ggc gaa gac gtg gac cac att 6960 
Arg Asp Gly Thr Ser Leu Phe Met Thr Gly Glu Asp Val Asp His lie 
2305 2310 2315 2320 

cac gaa tat gca ttg aat gaa cca tgg gac ata cgc aat gcc ata ctt 7008 
His Glu Tyr Ala Leu Asn Glu Pro Trp Asp lie Arg Asn Ala lie Leu 
2325 2330 2335 

gca ggc tec ctg tec ata age gca gtg aat ggt gca ccg egg ggg ctg 7056 
Ala Gly Ser Leu Ser lie Ser Ala Val Asn Gly Ala Pro Arg Gly Leu 
2340 2345 2350 

gat ata teg gag gat ggc aca act gca cat act atg cgc ggg cgt gac 7104 
Asp He Ser Glu Asp Gly Thr Thr Ala His Thr Met Arg Gly Arg Asp 
2355 2360 2365 

ttt gac acg ggg ccc gca tec ctg gta aac cac ata ttg cca ggc caa 7152 
Phe Asp Thr Gly Pro Ala Ser Leu Val Asn His He Leu Pro Gly Gin 
2370 2375 2380 

tat tec ctg ctg acg gat gcg ccg gcg ttt gca tac ccc gtg gag gag 7200 
Tyr Ser Leu Leu Thr Asp Ala Pro Ala Phe Ala Tyr Pro Val Glu Glu 
2385 2390 2395 2400 

gag ggt gca ccg ggg gat ctt gca ttc tec gat gac ggc atg cgc atg 7248 
Glu Gly Ala Pro Gly Asp Leu Ala Phe Ser Asp Asp Gly Met Arg Met 
2405 2410 2415 
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ttc gtg gcg ggc gta aac aac cat tta aga cag tac aac ctg ctg teg 
Phe Val Ala Gly Val Asn Asn His Leu Arg Gin Tyr Asn Leu Leu Ser 
2420 2425 2430 



7296 



ccg tat gac act gaa aat gca gaa cat ttc ate teg acg gat ctg ctg 
Pro Tyr Asp Thr Glu Asn Ala Glu His Phe lie Ser Thr Asp Leu Leu 
2435 2440 2445 



7344 



act gcg gac agg ggc ccc acg ggt ctt gta ttt tea gat gag aac gac 
Thr Ala Asp Arg Gly Pro Thr Gly Leu Val Phe Ser Asp Glu Asn Asp 
2450 2455 2460 



7392 



ttt ttc age aca ggc gee agg gec caa ttt gtg cgc cag ttt acg aca 
Phe Phe Ser Thr Gly Ala Arg Ala Gin Phe Val Arg Gin Phe Thr Thr 
2465 2470 2475 2480 



7440 



aac cgc ccg tac gac gca tec aca ata aca ctg agt gac aac gga ctg 
Asn Arg Pro Tyr Asp Ala Ser Thr He Thr Leu Ser Asp Asn Gly Leu 
2485 2490 2495 



7488 



tac aag gtg age gtg gac ggc ctg ccg tec ggc ata egg ttt ace ccc 
Tyr Lys Val Ser Val Asp Gly Leu Pro Ser Gly He Arg Phe Thr Pro 
2500 2505 2510 



7536 



gac ggc atg aag atg ttc ata teg ggc cag gag acg gee atg ata tac 
Asp Gly Met Lys Met Phe He Ser Gly Gin Glu Thr Ala Met He Tyr 
2515 2520 2525 



7584 



cag tat tec ctg ccg tec ccg tat gac aca tec ggg gcg gtc agg gac 
Gin Tyr Ser Leu Pro Ser Pro Tyr Asp Thr Ser Gly Ala Val Arg Asp 
2530 2535 2540 



7632 



agg gtt gag ata gtc gca ggg etc ttt aga aat gca ggt ttg tec gtc 
Arg Val Glu He Val Ala Gly Leu Phe Arg Asn Ala Gly Leu Ser Val 
2545 2550 2555 2560 



7680 



ggg ttg aac gag ccc agt cct tec ggc ttt gac ttt teg gag gac gga 
Gly Leu Asn Glu Pro Ser Pro Ser Gly Phe Asp Phe Ser Glu Asp Gly 
2565 2570 2575 



7728 



atg gag ctg tac gtg acg ggg teg ggc ctt gtt cac agg tat ttc ctg 
Met Glu Leu Tyr Val Thr Gly Ser Gly Leu Val His Arg Tyr Phe Leu 
2580 2585 2590 



7776 



cca teg cca tac ggc etc gaa gat gca gcg tac ggg ggc age ttc cac 
Pro Ser Pro Tyr Gly Leu Glu Asp Ala Ala Tyr Gly Gly Ser Phe His 
2595 2600 2605 



7824 



acg ttc agg gag age acg ccg ctg gga gtg gtg gtg egg ggg gat gee 
Thr Phe Arg Glu Ser Thr Pro Leu Gly Val Val Val Arg Gly Asp Ala 
2610 2615 2620 



7872 



atg ttt gtg gec ggg gac agt act gat tec ata ttg aaa tat tec ctg 
Met Phe Val Ala Gly Asp Ser Thr Asp Ser He Leu Lys Tyr Ser Leu 
2625 2630 2635 2640 



7920 



aac gca caa cct gtc ggc aac ata acc cat gec gat acg cgc gee ggg 
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Asn Ala Gin Pro Val Gly Asn lie Thr His Ala Asp Thr Arg Ala Gly 
2645 2650 2655 

att gcc gac agg gcg gag ate gtg ttt ggg gca atg gca gat acg cgc 8016 
lie Ala Asp Arg Ala Glu lie Val Phe Gly Ala Met Ala Asp Thr Arg 
2660 2665 2670 

gcc gag att etc gac ggc gcc gat gta gtt cat aag agt gtg aaa att 8064 
Ala Glu lie Leu Asp Gly Ala Asp Val Val His Lys Ser Val Lys lie 
2675 2680 2685 

gac gta ttc cca ata teg gag ggc ata aca gtg ggc agg gca ctt tat 8112 
Asp Val Phe Pro lie Ser Glu Gly He Thr Val Gly Arg Ala Leu Tyr 
2690 2695 2700 

cca gag gac gcc gcc ata ctt gat gac ggc gcg aat gcc acg cat aat 8160 
Pro Glu Asp Ala Ala lie Leu Asp Asp Gly Ala Asn Ala Thr His Asn 
2705 2710 2715 2720 

agg gtt gta ate att gtt cac gac ata aca gaa ggc gat gcg ccg tec 8208 
Arg Val Val He He Val His Asp He Thr Glu Gly Asp Ala Pro Ser 
2725 2730 2735 

ata cat gat gag ccg att gcc gtg ggg att tac gcc etc ggc cct atg 8256 
He His Asp Glu Pro He Ala Val Gly He Tyr Ala Leu Gly Pro Met 
2740 2745 2750 

gat aca ate gcc gtg gtt gat etc cac cgc ctg gcc gta tec gca tec 8304 
Asp Thr He Ala Val Val Asp Leu His Arg Leu Ala Val Ser Ala Ser 
2755 2760 2765 

ttg tec ggg ggt gat tec ccg teg gcc tea gat gca tec gga gta gtg 8352 
Leu Ser Gly Gly Asp Ser Pro Ser Ala Ser Asp Ala Ser Gly Val Val 
2770 2775 2780 

gcc gag age cgc aga aac gcg gtg gac agg cct ggc gtg gaa gag cgc 84 00 

Ala Glu Ser Arg Arg Asn Ala Val Asp Arg Pro Gly Val Glu Glu Arg 
2785 2790 2795 2800 

ata gga cat ggt gta tec ctg gag gcg gcc gac agg cct gcc gtc gac 8448 
He Gly His Gly Val Ser Leu Glu Ala Ala Asp Arg Pro Ala Val Asp 
2805 2810 2815 

aac atg atg gat acg gat agt gcc ggc gtg tac gac cgc agt ccg gac 84 96 

Asn Met Met Asp Thr Asp Ser Ala Gly Val Tyr Asp Arg Ser Pro Asp 
2820 2825 2830 

gac ggg ccc gcc gta tec gac agg tec gcg ctg ggg ctt gcc egg atg 8 544 

Asp Gly Pro Ala Val Ser Asp Arg Ser Ala Leu Gly Leu Ala Arg Met 
2835 2840 2845 

gca gcc gac agg cct gca gtc gat gac atg atg gat acg gat agt gcc 8592 
Ala Ala Asp Arg Pro Ala Val Asp Asp Met Met Asp Thr Asp Ser Ala 
2850 2855 2860 



ggc gtg tac gac cgc age ccg gac gac ggg ccc gcc ata tec gac agg 
Gly Val Tyr Asp Arg Ser Pro Asp Asp Gly Pro Ala He Ser Asp Arg 
2865 2870 2875 2880 
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tec gcg ctg ggg ctt gec egg atg gca gee gac agg cct gca gtc gac 8688 
Ser Ala Leu Gly Leu Ala Arg Met Ala Ala Asp Arg Pro Ala Val Asp 
2885 2890 2B95 

gac atg atg gat acg ggc agt gec ggc gtg tac gac cgc age ccg gac 8736 
Asp Met Met Asp Thr Gly Ser Ala Gly Val Tyr Asp Arg Ser Pro Asp 
2900 2905 2910 

gac ggg ccc gec ata tec gac agg tec gcg ctg ggg ctt gec egg atg 8784 
Asp Gly Pro Ala lie Ser Asp Arg Ser Ala Leu Gly Leu Ala Arg Met 
2915 2920 2925 

gca gee gac agg cct gca gtc gat gac atg atg gat acg ggc agt gag 8832 
Ala Ala Asp Arg Pro Ala Val Asp Asp Met Met Asp Thr Gly Ser Glu 
2930 2935 2940 

age acg age agg ctt gga ccg gtt gac agg cca gaa ata gtc gag cgc 8880 
Ser Thr Ser Arg Leu Gly Pro Val Asp Arg Pro Glu lie Val Glu Arg 
2945 2950 2955 2960 

cac age ctg gee gcg tct gta tac ctg tec ggg ggc gat tec ccg teg 8928 
His Ser Leu Ala Ala Ser Val Tyr Leu Ser Gly Gly Asp Ser Pro Ser 
2965 2970 2975 

gtc gca gac ggt cat gat gtg gag tec gag ggc cgc aga gac ggg ggg 8976 
Val Ala Asp Gly His Asp Val Glu Ser Glu Gly Arg Arg Asp Gly Gly 
2980 2985 2990 

gac agg cct ggc ate gac gag cgt ata gtc ate aag ate teg tac age 9024 
Asp Arg Pro Gly lie Asp Glu Arg lie Val lie Lys lie Ser Tyr Ser 
2995 3000 3005 

cgc ggc gca gee gat gcg ccc aga gtg gag gat gca atg gag act tec 9072 
Arg Gly Ala Ala Asp Ala Pro Arg Val Glu Asp Ala Met Glu Thr Ser 
3010 3015 3020 

99C gtg ace gcg tac age cgc ggc gca gee gat gcg ccc aga gtg gag 9120 
Gly Val Thr Ala Tyr Ser Arg Gly Ala Ala Asp Ala Pro Arg Val Glu 
3025 3030 3035 3040 

gat gca atg gag act tec ggc gtg ace gtc ccc agg cgc agt acc atg 9168 
Asp Ala Met Glu Thr Ser Gly Val Thr Val Pro Arg Arg Ser Thr Met 
3045 3050 3055 

gac gcg ccc aca gtg gec gat gac cac age ctg gee egg acc gca tec 9216 
Asp Ala Pro Thr Val Ala Asp Asp His Ser Leu Ala Arg Thr Ala Ser 
3060 3065 3070 

ata tec gaa ggc gat tec ccg aca ttt gca gag gcg cgc cgc gcg gat 9264 
He Ser Glu Gly Asp Ser Pro Thr Phe Ala Glu Ala Arg Arg Ala Asp 
3075 3080 3085 

acc gtt ggg gat ata gac gag gtg gac gcg ccc aca gtg gee gat gac 9312 
Thr Val Gly Asp He Asp Glu Val Asp Ala Pro Thr Val Ala Asp Asp 
3090 3095 3100 



cac agt ctg gec egg gec gca tec ata tec gaa ggc gat tec ccg aca 
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His Ser Leu Ala Arg Ala Ala Ser lie Ser Glu Gly Asp Ser Pro Thr 
3105 3110 3115 3120 

ttt gca gag gtg cgc cgc gcg gat acc gtt ggg gat ata gac gag gtg 9408 
Phe Ala Glu Val Arg Arg Ala Asp Thr Val Gly Asp lie Asp Glu Val 
3125 3130 3135 

gac gcg ccc gcc gtg gcc gag agg etc ctg gca gtc etc ggc ctg cag 9456 
Asp Ala Pro Ala Val Ala Glu Arg Leu Leu Ala Val Leu Gly Leu Gin 
3140 3145 3150 

gcc cct gat teg ccg gga gtg tgg gat act gta gga ata gat cac teg 9504 
Ala Pro Asp Ser Pro Gly Val Trp Asp Thr Val Gly lie Asp His Ser 
3155 3160 3165 

gag att tea ggc gat cct gtg ccg gag cca aga gta gtg ccc agg ggc 9552 
Glu lie Ser Gly Asp Pro Val Pro Glu Pro Arg Val Val Pro Arg Gly 
3170 3175 3180 

9gt ggc ggt ggg gga ggc ggt tct teg aac cgc ggc ctt gaa ccg cat 9600 
Gly Gly Gly Gly Gly Gly Gly Ser Ser Asn Arg Gly Leu Glu Pro His 
3185 3190 3195 3200 

ggc ggc ggg tat gag att gac ttt gag ttc cgc ata gac ggc agg ctg 9648 
Gly Gly Gly Tyr Glu lie Asp Phe Glu Phe Arg lie Asp Gly Arg Leu 
3205 3210 3215 

gtg etc ttc aat ggg aca gac gtg eta gcc gaa tec ggc aag gac ctg 9696 
Val Leu Phe Asn Gly Thr Asp Val Leu Ala Glu Ser Gly Lys Asp Leu 
3220 3225 3230 

etc ate cgt ccg gtg ttc egg ccg gag ggg agt ttc aac ata ttt gat 9744 
Leu lie Arg Pro Val Phe Arg Pro Glu Gly Ser Phe Asn lie Phe Asp 
3235 3240 3245 

atg gag gtg ttg ttt acc gcc ccc ggc ggg gag ata teg act gcc tac 9792 
Met Glu Val Leu Phe Thr Ala Pro Gly Gly Glu lie Ser Thr Ala Tyr 
3250 3255 3260 

tac aac agg get gga ate etc atg ggg att gac tgc ggc gag ctg att 9840 
Tyr Asn Arg Ala Gly lie Leu Met Gly lie Asp Cys Gly Glu Leu lie 
3265 3270 3275 3280 

atg acc gat acg acg tat tea tgc gac atg ctg gac ata ttc gga gat 9888 
Met Thr Asp Thr Thr Tyr Ser Cys Asp Met Leu Asp He Phe Gly Asp 
3285 3290 3295 

gag ata tac cat gtg gag agg ctt gac gca ttc aac ggc atg gtc ate 9936 
Glu He Tyr His Val Glu Arg Leu Asp Ala Phe Asn Gly Met Val He 
3300 3305 3310 

tec ttg gac ggc ccc etc gac ggg acg gtc agt gta teg ctt cgt gac 9984 
Ser Leu Asp Gly Pro Leu Asp Gly Thr Val Ser Val Ser Leu Arg Asp 
3315 3320 3325 



aac cac ggc ate ccg ctg gcg cag cat egg ctg cat aaa tac gag att 
Asn His Gly He Pro Leu Ala Gin His Arg Leu His Lys Tyr Glu lie 
3330 3335 3340 
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ttg att ttg gac gcc get gaa aac aga ccc ctg tea gtc teg acg gac 
Leu lie Leu Asp Ala Ala Glu Asn Arg Pro Leu Ser Val Ser Thr Asp 
3345 3350 3355 3360 



10080 



ccc aag ccc gtg gag gat cca teg ccc gtg cag cat ata gag tec etc 
Pro Lys Pro Val Glu Asp Pro Ser Pro Val Gin His He Glu Ser Leu 
3365 3370 3375 



10128 



cag atg gat ccg gag ccc gtg gag tec gag ccc etc ccg atg gac tec 
Gin Met Asp Pro Glu Pro Val Glu Ser Glu Pro Leu Pro Met Asp Ser 
3380 3385 3390 



10176 



gag ccc gtg gag gat ctg gaa cct gtg cag cat eta gag tec etc ccg 
Glu Pro Val Glu Asp Leu Glu Pro Val Gin His Leu Glu Ser Leu Pro 
3395 3400 3405 



10224 



atg gac ccc gag ccc gtg gag gat ctg gaa cct gtg cag cat etc gag 
Met Asp Pro Glu Pro Val Glu Asp Leu Glu Pro Val Gin His Leu Glu 
3410 3415 3420 



10272 



ccc gtg cag gga tec ccg ccc gtg cag gga ggg ccg gag tec gtg gag 
Pro Val Gin Gly Ser Pro Pro Val Gin Gly Gly Pro Glu Ser Val Glu 
3425 3430 3435 3440 



10320 



tea ggc ata gca tac acg eta tgg cag ttc ctt tea gga ctg ctg gat 
Ser Gly He Ala Tyr Thr Leu Trp Gin Phe Leu Ser Gly Leu Leu Asp 
3445 3450 3455 



10368 



gcc ctg ggt ctt gcc gac ccg gat gtc gga tct gtc caa aaa acg tec 
Ala Leu Gly Leu Ala Asp Pro Asp Val Gly Ser Val Gin Lys Thr Ser 
3460 3465 3470 
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Leu Val Ser Asp Ser Ala Asp Gly Thr lie His Arg Tyr Glu Leu Ala 

130 135 140 

Ser Pro Tyr Glu Pro Ala Gly Ala Ala Asn Arg Gly Ser Phe Asp Val 
145 150 155 160 

Ser Asp Met Asp Gly Ser Pro Val Gly Ala Gly Phe Ala Gly Gly Leu 

165 170 175 

His Met Tyr Val Ala Gly Asn Asp Thr Gly Arg Val Tyr Gin Tyr Pro 

160 185 190 

Ala Gly Thr His Gin lie Gin Glu Ala Ala Ala Gly Pro Arg Leu Leu 

195 200 205 

Ser Ala Val Leu Asp Lys Asp Gly Thr Leu Arg Ala Ala Phe Asp Gly 

210 215 220 

Thr Val Asp Ala Gly Ser Val Gin Pro Gly Met lie Thr lie Arg Asp 
225 230 235 240 

Gly His Gly Ser Asn Thr Gly lie Pro Leu Leu Leu Ala Gly Gly Ala 

245 250 255 

Ala Asp Ser Asp Val Met Thr Phe Val Val Pro Glu Lys Asp Arg Ala 

260 265 270 

Glu Ala Ala Ala Tyr Gly Asp Gin Ser Leu His Val Pro Ala Ala Ala 

275 280 285 

Leu Ala Gly Thr Gly Gly Gly Pro Phe Val Pro Asp Phe Ser Gly Gly 

290 295 300 

Ser Leu Leu Ala Ser Leu Tyr Arg His Glu Arg Pro Phe Gin Gly Glu 
305 310 315 320 

Glu Met Ala Arg Thr Glu Arg Ser Asp Arg Tyr Ala Leu Thr Val Thr 

325 330 335 

Ala Gly Gly Ser Gin Met His Val Gly Gly Ala Gly Gly Asn lie Thr 

340 345 350 

Trp Tyr Asp Leu Gly Thr Pro His Asp He Thr Thr Gly Val Arg Ala 

355 360 365 

Gly Ser Asp He Leu Pro Ala Tyr Pro Ser Ala Gly Arg Asn Val Val 

370 375 380 

Pro Ser He Thr Gly He Ala Phe Ser Asp Asp Gly Met Arg Leu Phe 
385 390 395 400 

Ala Ala Asn Arg Gly Asp Arg He Pro Met Tyr Gin Leu Asp Ser Pro 

405 410 415 

Tyr Asp He Gly Ser Ala Ser Leu Glu Gly Thr Leu Phe Thr Gly Phe 

420 425 430 

Gin Ser Gly He Ala Phe Ser Asp Asp Gly Thr Arg Met Phe Ala Ala 

435 440 445 

Leu Leu Thr Glu Asn Ala lie Arg Gin Tyr Asp Leu Glu Gly Pro Tyr 

450 455 460 

Asp He Arg Gly Ala Gly Asn Ala Gly Gin Tyr Asp Leu Asp He Pro 
465 470 475 480 

Leu His Pro Gly Leu Leu Phe Leu Leu Thr Ser Gly Val His Phe Ser 

485 490 495 

Pro Asp Gly Thr Arg Met Phe Val Gly Glu Gly He Ser Asp Ala Glu 

500 505 510 

Asp Ala Asn Ala Asn Arg Asp Val Asn Val Asn Leu Trp His Arg Phe 

515 520 525 

Asp Leu Ser Thr Pro Phe Asp Val Leu Thr Ala Glu Arg Val Asp Thr 

530 535 540 

Tyr Glu Tyr Ser Thr Gly Pro Ala Gly Asp Leu Glu Asp Leu Ser Leu 
545 550 555 560 

Ser Pro Asp Gly Arg Arg Leu Tyr Thr Leu Ser Ser Glu Arg Val Ser 

565 570 575 

Ser Ser Glu Tyr Thr lie Thr Arg Ala Gin Tyr Trp Leu Pro Glu Pro 
580 585 590 
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Tyr Asp Val Thr Pro Pro Tyr His Val Pro Ser Phe Asn Ala Ser Gin 

595 600 605 

Gly Gly Asn Leu Ala Asp Pro Tyr Gly Met Ala Phe Ser Pro Asp Gly 

610 615 620 

Thr Arg Leu Leu Val Thr Gly His Gly Gin Thr Asn Ala Lys Leu Phe 
625 630 635 640 

His Leu Asn Pro Pro Phe Asp Val Gly Thr Ala Val Phe His Asp His 

645 650 655 

Gly Arg Phe Arg Pro Gly Gly Pro Ala Ser Glu lie Glu Ala Ser Gly 

660 665 670 

lie Ser Leu Ser Ala Asp Gly Ser Arg Met Phe Leu Ser Asp Arg Gly 

675 680 685 

Arg Gly Ala lie Ser Gin Tyr Thr Leu Val Ala Pro Phe Asp Val Glu 

690 695 700 

Phe Ala Ser Asp Val Ser Ala Asp Gly Gin Leu Asp Val Gly Ala Gin 
705 710 715 720 

Asp Ala Leu Pro Gly Gly Leu Ala Phe Ser Pro Gly Gly Thr Arg Leu 

725 730 735 

Phe Met Val Gly Gly Met Asp Arg Ser Val His Met Tyr Ser Leu Asn 

740 745 750 

Thr Pro Phe Asp Leu Gly Gly Ala Glu His Ala Ala Ser Phe Gly Val 

755 760 765 

Gly Asp Arg Val Ser Asp Pro Leu Gly lie Ala Phe Gly Asn Gly Gly 

770 775 780 

Thr Lys Met Leu He Ala Asp Thr Thr Gly Phe Val His Gly Tyr Asp 
785 790 795 800 

Leu Gly Ala Pro Tyr Asp He Ser Gly Pro Ala Tyr Ser Gly He Phe 

805 810 815 

Asp Ala Gly Gly Ser He Arg Asp Val Ala Val Gly Gly Gly Ser Met 

820 825 830 

Phe He Leu Glu Gly Glu Thr Asp Arg Val Tyr Glu His Arg Pro Gly 

835 840 845 

He Tyr Pro Val Val Ser Ala Leu Asp Gly Pro Ala Leu Val Ser Ala 

850 855 860 

Ala Ala Asp Ala Arg Val Gly Ala Ala Glu Val Leu Phe Asp Arg Ala 
865 870 875 880 

Val Asp Val Gly Gly He Asp Pro Gly Gly Val Arg He Val Asp Ala 

885 890 895 

Ala Gly Pro Leu Pro Gly Val Val He Ser Asp Ala Val He Pro Gly 

900 905 910 

Glu Asp Pro Gly Val Ala Arg Phe Ser Leu Ser Asp Ala Glu Val Leu 

915 920 925 

Ala Val Ser Gly Tyr Ala Glu Pro Ser Leu Val Phe Gly Arg His Ala 

930 935 940 

Val Pro Gly Ala Ala Gly Gly Thr Phe Pro Ser Gin He Gly Asn Ala 
945 950 955 960 

Thr Glu Leu Val Gly Ser He Pro Asn Pro Thr Leu Asp Phe Gly Thr 

965 970 975 

Thr Leu Thr Gly Ala Ala Phe Ser Ala Asp Gly Thr Val Val Phe Leu 

980 985 990 

Ser Asp Gly Pro Thr Gly Arg Val Tyr Pro Tyr Ser Leu Asn He Pro 

995 1000 1005 

Phe Asp lie Ser Ser Ala Ala Pro Gly Gly Phe Val He Val Pro Val 

1010 1015 1020 

Gly Val Ser Asp lie Ala Phe Ser Ala Asp Gly Arg Asn Met Leu Val 
1025 1030 1035 1040 

Ala Asp Glu Thr Gly Gly lie His Arg Tyr Leu Ala Arg Ser Pro Tyr 
1045 1050 1055 
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Glu lie Gly Thr Asp Phe lie Lys Ser Ser Leu Gly Glu Phe Val Glu 

1060 1065 1070 

Thr Phe Ser Ala Ala Pro Arg Val Gin Asp Leu Ala Gly He Ala Phe 

1075 1080 1085 

Ser His Asp Gly Met He Met Leu Ala Ala Gly Gly Ser Gly Ser Val 

1090 1095 1100 

His Arg Tyr Ser Leu Pro Ser Pro Tyr Ala Val Ser Gly Ala Lys Tyr 
1105 1110 1115 1120 

Glu Glu Thr Ala Met He Gly Gly Ser Pro Ser Gly Leu Glu Phe Ser 

1125 1130 1135 

Ser Asp Gly Leu Arg Met Phe Val Pro Asp Ala Gly Ser Glu Thr Ala 

1140 1145 1150 

Ala Val Tyr Gly Leu Ala Ala Pro Tyr Gly He Gly Glu Ala Glu Pro 

1155 1160 1165 

Leu Pro Pro Leu Phe Leu Gly Val Gly Ala Glu Glu Ala Thr Leu Ser 

1170 1175 1180 

Pro Asp Gly Arg His He Leu Val Pro Gly Arg Pro Gly Leu Ser Gin 
1185 1190 1195 1200 

Tyr Ser Leu Phe Ser Thr Asn Leu Glu Leu Cys Ala Glu Pro Arg Gly 

1205 1210 1215 

He Asp Gly Gly Ser Cys Glu Asp Gly He Tyr Ala Phe Glu Ser Pro 

1220 1225 1230 

Gly Arg Gly Glu Gly Val Ser Leu Ala Ala Ser He Thr Ala Ala Asp 

1235 1240 1245 

Gly Pro Gly He Gly Glu Leu His Gly Phe Ala Gly Pro Pro Met Pro 

1250 1255 1260 

Ala Pro Val Met Glu Gin Val Thr Leu Asp Ser Arg Glu Gly Thr Leu 
1265 1270 1275 1280 

Arg Val Arg Leu Asp Arg Thr Val Asp Val Asp Thr Val Arg Pro Tyr 

1285 1290 1295 

Lys Met Trp Val Glu Asp Ser Asp Gly Ser Gin Thr Thr Leu Ala Asn 

1300 1305 1310 

Ser Thr Leu Leu Asn Ala Glu Asn Ser Asn He Leu Leu Phe Arg Leu 

1315 1320 1325 

Asp Asp Ala Ala Ala Gly Lys lie Ser Gly Tyr Thr Ser Pro Val Phe 

1330 1335 1340 

Arg Thr Trp Ser Ser Pro Phe Leu Gly Thr Asp Gly Ala Thr Arg Pro 
1345 1350 1355 1360 

His Thr Leu Gly Phe Gly Asp Val Arg Leu Ala Asp He Tyr Asp Ala 

1365 1370 1375 

Ser Gly Asp Val Pro Ser Pro Ser Gly He Glu Phe Ser Asp Asp Gly 

1380 1385 1390 

Met Arg Met Phe Val Thr Gly He Gly Thr Pro Gly He Asn He Phe 

1395 1400 1405 

Thr lieu Ser Ala Pro Phe Asp He Thr Leu Pro Lys His Ser Gly Ser 

1410 1415 1420 

Thr Asn He Gly Gly Leu Ser Val Ser Asp Leu Ala Phe Ala Asn Asn 
1425 1430 1435 1440 

Gly Asn Ser Leu Thr Val Leu Asp Val Asp Gly Val Leu Arg Val Tyr 

1445 1450 1455 

Ala Leu Gly Asp Asp Tyr Asn Val Val Thr Gly Thr Thr Gin Lys Phe 

1460 1465 1470 

Arg He Thr Leu Asp Thr Thr Gin Gly He Pro Asn Ser He Tyr Thr 

1475 1480 1485 

Ser Pro Asp Gly Leu Ser Gin Phe Val Ala Tyr Asp Asp Arg He Asp 

1490 1495 1500 

Leu Tyr Val Leu Gly Ser Pro Asn Asp He Ser Ser Thr Thr Glu lie 
1505 1510 1515 1520 
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lie Pro Tyr Ser Leu Pro Arg Pro Asp Pro Pro Thr Gly Met Asp Phe 

1525 1530 1535 

Thr Pro Asp Gly Arg Arg Met Phe Leu Ser Thr Glu Asn Gly lie Asp 

1540 1545 1550 

Gin Tyr Leu Leu Ser Glu Pro Phe Ala Val Thr Thr Ser Val Phe Leu 

1555 1560 1565 

Arg Thr lie Pro lie Asp Gly Gly Ala Glu Gly He Arg Phe Val Asp 

1570 1575 1580 

Asn Gly Arg Gly Leu Phe Val Pro Gly Ala Asp Gly He He Gin Arg 
1585 1590 1595 1600 

His Glu Leu He Tyr Pro Tyr Gly Ala Ser Thr Ser Leu Leu Glu Thr 

1605 1610 1615 

Val Arg Asp Gly Val Thr Asp Gly Gly Pro Gly Glu Asn Pro Ala Ala 

1620 1625 1630 

Gly Glu He Arg Leu Ala Gly Thr Phe Asn Ala Ser Asp Asn Val Gin 

1635 1640 1645 

Ser Pro Ser Gly He Glu Phe Ser Gly Asp Gly Thr Gly Met Phe Val 

1650 1655 1660 

Thr Gly Phe Gly Ala Ala Gly Val Asn Glu Phe Ser Leu Ser Ala Pro 
1665 1670 1675 1680 

Phe Asp Thr Thr Leu Pro Val His Val Glu Leu His Asp lie Gly Gly 

1685 1690 1695 

Gin Pro Ala Val Asp Leu Ala Phe Ala Glu Asp Gly Arg Thr Leu Leu 

1700 1705 1710 

Leu Leu Ala Ala Asp Gly Thr Leu Asp Phe Tyr Ser Leu Ala Gly Asp 

1715 1720 1725 

Ala Tyr Asp He Gly Glu Ala Ser Arg Thr Phe Gin Val Pro Phe Glu 

1730 1735 1740 

Asp Ala Ala Gly Ala Val Pro Gly Ala Phe Tyr Gin Pro Pro Asp Gly 
1745 1750 1755 1760 

Ser Ser He He Ala Ala Phe Asp Gly Arg He Asp Gin Tyr Val Val 

1765 1770 1775 

He Pro Phe Glu Phe Val Ser Tyr Pro Leu Thr Arg Pro Gly Thr Pro 

1780 1785 1790 

Thr Gly He Asp Phe Ala Pro Asp Gly Arg Trp Met Phe Leu Ser Thr 

1795 1800 1805 

Glu Asn Gly He Asp Gin Tyr Leu Leu Ser He Pro Phe Asp Val Arg 

1810 1815 1820 

Ser Leu Thr Tyr Thr Gly Thr He Pro Val Asp Gly Val Glu Gly Met 
1825 1830 1835 1840 

Gin Phe Ala Asp Asn Gly Arg Ala Leu Phe Leu Ala Asp Ser Glu Gly 

1845 1850 1855 

Leu He Tyr Asn Tyr Asp Leu Glu Asp Pro Tyr Ala Leu Asp Gly Asn 

1860 1865 1870 

Thr He Ser Val Glu Phe Ser Phe Asp Gly Ser Val Met Tyr Val Leu 

1875 1880 1885 

Glu Tyr Asp Thr Lys Arg Val Val Ser Tyr Glu Leu Glu Phe Pro Phe 

1890 1895 1900 

Asp Val Ser Ser Arg Thr Arg Ala Asp Thr Leu Asp He Pro Gin He 
1905 1910 1915 1920 

Asp Ser Pro Arg His Val Ala Val Ser Met Pro Gly Asn His Leu Tyr 

1925 1930 1935 

He Thr Asn Ser Val Phe Gly Glu Asp Asp Thr He His Ser Tyr Gly 

1940 1945 1950 

He Ser Asn Asn Asp He Ser Ser Ala Ser Tyr He Gly Glu Glu Gly 

1955 1960 1965 

He Pro Glu Pro Val He Asn Gly lie Asp Phe Ser Asn Asn Gly Arg 
1970 1975 1980 
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Arg Met Phe Leu lie Gly Gly Asn Gly Phe Asp Tyr Gin Val lie His 
1985 1990 1995 2000 

Asp Tyr Met Leu Gly Thr Arg Tyr Asp He Ser Ser Arg Ser Leu Leu 

2005 2010 2015 

Asp Thr Tyr Ala He Pro Gly Pro Val Val Phe Pro Ala Gly Leu Asp 

2020 2025 2030 

Phe Ser Phe Asp Arg Leu Ser Met Phe He He Ser Thr Ala Gly Ser 

2035 2040 2045 

Val Tyr Arg Tyr Gly Leu Asp Asp Pro Phe He Val Glu Thr Met Asp 

2050 2055 2060 

Tyr Gin Glu Ser Phe Arg Leu Pro Val Pro Ser Ala Ala Asp Asn Ser 
2065 2070 2075 2080 

He Ser Asp Leu Ala Phe Gly Ser Ser Gly Leu Asn Ala Val He Ser 

2085 2090 2095 

His Glu Gly Leu Asp Thr Leu Tyr Ser Phe Val Leu Asp He Pro Tyr 

2100 2105 2110 

Gly Ala Glu Leu Asp He Asp Arg Leu Glu Leu Pro Leu Val Gly Val 

2115 2120 2125 

Pro Thr Gly Phe Glu Phe Ser Asp Asn Gly Arg Gin Leu Tyr He Gly 

2130 2135 2140 

Ala Phe Arg Asp Ser Gin Ser Ser Pro Gly Thr Leu Pro Ala Gly Leu 
2145 2150 2155 2160 

Gin Arg Tyr Glu Leu Gly lie Pro Tyr Asp Leu Ala Ser Ala Val Phe 

2165 2170 2175 

Ala Gin Ser Leu Gly He Phe Asp Phe Pro Pro Phe Asn Gly Met Arg 

2180 2185 2190 

Ala Asn Gly Ser Leu Ala Gly Leu His Val Pro Pro Asp Gly Ser He 

2195 2200 2205 

Leu Phe Arg Ala Gly Asn Ala Glu Arg Thr Val He Ser Tyr Asp Met 

2210 2215 2220 

Asp Ser His Asp Leu Asp Thr Leu Ser Phe Arg Glu Ser Phe Lys Pro 
2225 2230 2235 2240 

Asp Val Gly Gin Ser Thr Pro Asn lie Arg Asp Met Asp He Ser Pro 

2245 2250 2255 

Asp Gly Met Phe lieu Tyr Leu Leu Gin Gly Asp Val Leu Asp Met Tyr 

2260 2265 2270 

Asn Leu Thr Asp Ser Tyr Ser Leu Asp Ala Pro Ala Tyr Ala Gly Thr 

2275 2280 2285 

Leu Asp Leu Glu Pro Glu Asp Val He Pro Arg Gly He Ser Phe Ser 

2290 2295 2300 

Arg Asp Gly Thr Ser Leu Phe Met Thr Gly Glu Asp Val Asp His He 
2305 2310 2315 2320 

His Glu Tyr Ala Leu Asn Glu Pro Trp Asp He Arg Asn Ala He Leu 

2325 2330 2335 

Ala Gly Ser Leu Ser He Ser Ala Val Asn Gly Ala Pro Arg Gly Leu 

2340 2345 2350 

Asp He Ser Glu Asp Gly Thr Thr Ala His Thr Met Arg Gly Arg Asp 

2355 2360 2365 

Phe Asp Thr Gly Pro Ala Ser Leu Val Asn His He Leu Pro Gly Gin 

2370 2375 2380 

Tyr Ser Leu Leu Thr Asp Ala Pro Ala Phe Ala Tyr Pro Val Glu Glu 
2385 2390 2395 2400 

Glu Gly Ala Pro Gly Asp Leu Ala Phe Ser Asp Asp Gly Met Arg Met 

2405 2410 2415 

Phe Val Ala Gly Val Asn Asn His Leu Arg Gin Tyr Asn Leu Leu Ser 

2420 2425 2430 

Pro Tyr Asp Thr Glu Asn Ala Glu His Phe lie Ser Thr Asp Leu Leu 
2435 2440 2445 
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Thr Ala Asp Arg Gly Pro Thr Gly Leu Val Phe Ser Asp Glu Asn Asp 

2450 2455 2460 

Phe Phe Ser Thr Gly Ala Arg Ala Gin Phe Val Arg Gin Phe Thr Thr 
2465 2470 2475 2480 

Asn Arg Pro Tyr Asp Ala Ser Thr lie Thr Leu Ser Asp Asn Gly Leu 

2485 2490 2495 

Tyr Lys Val Ser Val Asp Gly Leu Pro Ser Gly lie Arg Phe Thr Pro 

2500 2505 2510 

Asp Gly Met Lys Met Phe lie Ser Gly Gin Glu Thr Ala Met lie Tyr 

2515 2520 2525 

Gin Tyr Ser Leu Pro Ser Pro Tyr Asp Thr Ser Gly Ala Val Arg Asp 

2530 2535 2540 

Arg Val Glu lie Val Ala Gly Leu Phe Arg Asn Ala Gly Leu Ser Val 
2545 2550 2555 2560 

Gly Leu Asn Glu Pro Ser Pro Ser Gly Phe Asp Phe Ser Glu Asp Gly 

2565 2570 2575 

Met Glu Leu Tyr Val Thr Gly Ser Gly Leu Val His Arg Tyr Phe Leu 

2580 2585 2590 

Pro Ser Pro Tyr Gly Leu Glu Asp Ala Ala Tyr Gly Gly Ser Phe His 

2595 2600 2605 

Thr Phe Arg Glu Ser Thr Pro Leu Gly Val Val Val Arg Gly Asp Ala 

2610 2615 2620 

Met Phe Val Ala Gly Asp Ser Thr Asp Ser He Leu Lys Tyr Ser Leu 
2625 2630 2635 2640 

Asn Ala Gin Pro Val Gly Asn He Thr His Ala Asp Thr Arg Ala Gly 

2645 2650 2655 

He Ala Asp Arg Ala Glu He Val Phe Gly Ala Met Ala Asp Thr Arg 

2660 2665 2670 

Ala Glu He Leu Asp Gly Ala Asp Val Val His Lys Ser Val Lys He 

2675 2680 2685 

Asp Val Phe Pro He Ser Glu Gly He Thr Val Gly Arg Ala Leu Tyr 

2690 2695 2700 

Pro Glu Asp Ala Ala He Leu Asp Asp Gly Ala Asn Ala Thr His Asn 
2705 2710 2715 2720 

Arg Val Val He He Val His Asp He Thr Glu Gly Asp Ala Pro Ser 

2725 2730 2735 

He His Asp Glu Pro He Ala Val Gly He Tyr Ala Leu Gly Pro Met 

2740 2745 2750 

Asp Thr He Ala Val Val Asp Leu His Arg Leu Ala Val Ser Ala Ser 

2755 2760 2765 

Leu Ser Gly Gly Asp Ser Pro Ser Ala Ser Asp Ala Ser Gly Val Val 

2770 2775 2780 

Ala Glu Ser Arg Arg Asn Ala Val Asp Arg Pro Gly Val Glu Glu Arg 
2785 2790 2795 2800 

He Gly His Gly Val Ser Leu Glu Ala Ala Asp Arg Pro Ala Val Asp 

2805 2810 2815 

Asn Met Met Asp Thr Asp Ser Ala Gly Val Tyr Asp Arg Ser Pro Asp 

2820 2825 2830 

Asp Gly Pro Ala Val Ser Asp Arg Ser Ala Leu Gly Leu Ala Arg Met 

2835 2840 2845 

Ala Ala Asp Arg Pro Ala Val Asp Asp Met Met Asp Thr Asp Ser Ala 

2850 2855 2860 

Gly Val Tyr Asp Arg Ser Pro Asp Asp Gly Pro Ala He Ser Asp Arg 
2865 2870 2875 2880 

Ser Ala Leu Gly Leu Ala Arg Met Ala Ala Asp Arg Pro Ala Val Asp 

2885 2890 2895 

Asp Met Met Asp Thr Gly Ser Ala Gly Val Tyr Asp Arg Ser Pro Asp 
2900 2905 2910 
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Asp Gly Pro Ala lie Ser Asp Arg Ser Ala Leu Gly Leu Ala Arg Met 

2915 2920 2925 

Ala Ala Asp Arg Pro Ala Val Asp Asp Met Met Asp Thr Gly Ser Glu 

2930 2935 2940 

Ser Thr Ser Arg Leu Gly Pro Val Asp Arg Pro Glu lie Val Glu Arg 
2945 2950 2955 2960 

His Ser Leu Ala Ala Ser Val Tyr Leu Ser Gly Gly Asp Ser Pro Ser 

2965 2970 2975 

Val Ala Asp Gly His Asp Val Glu Ser Glu Gly Arg Arg Asp Gly Gly 

2980 2985 2990 

Asp Arg Pro Gly lie Asp Glu Arg He Val He Lys He Ser Tyr Ser 

2995 3000 3005 

Arg Gly Ala Ala Asp Ala Pro Arg Val Glu Asp Ala Met Glu Thr Ser 

3010 3015 3020 

Gly Val Thr Ala Tyr Ser Arg Gly Ala Ala Asp Ala Pro Arg Val Glu 
3025 3030 3035 3040 

Asp Ala Met Glu Thr Ser Gly Val Thr Val Pro Arg Arg Ser Thr Met 

3045 3050 3055 

Asp Ala Pro Thr Val Ala Asp Asp His Ser Leu Ala Arg Thr Ala Ser 

3060 3065 3070 

He Ser Glu Gly Asp Ser Pro Thr Phe Ala Glu Ala Arg Arg Ala Asp 

3075 3080 3085 

Thr Val Gly Asp He Asp Glu Val Asp Ala Pro Thr Val Ala Asp Asp 

3090 3095 3100 

His Ser Leu Ala Arg Ala Ala Ser He Ser Glu Gly Asp Ser Pro Thr 
3105 3110 3115 3120 

Phe Ala Glu Val Arg Arg Ala Asp Thr Val Gly Asp He Asp Glu Val 

3125 3130 3135 

Asp Ala Pro Ala Val Ala Glu Arg Leu Leu Ala Val Leu Gly Leu Gin 

3140 3145 3150 

Ala Pro Asp Ser Pro Gly Val Trp Asp Thr Val Gly He Asp His Ser 

3155 3160 3165 

Glu He Ser Gly Asp Pro Val Pro Glu Pro Arg Val Val Pro Arg Gly 

3170 3175 3180 

Gly Gly Gly Gly Gly Gly Gly Ser Ser Asn Arg Gly Leu Glu Pro His 
3185 3190 3195 3200 

Gly Gly Gly Tyr Glu He Asp Phe Glu Phe Arg He Asp Gly Arg Leu 

3205 3210 3215 

Val Leu Phe Asn Gly Thr Asp Val Leu Ala Glu Ser Gly Lys Asp Leu 

3220 3225 3230 

Leu He Arg Pro Val Phe Arg Pro Glu Gly Ser Phe Asn He Phe Asp 

3235 3240 3245 

Met Glu Val Leu Phe Thr Ala Pro Gly Gly Glu He Ser Thr Ala Tyr 

3250 3255 3260 

Tyr Asn Arg Ala Gly lie Leu Met Gly He Asp Cys Gly Glu Leu He 
3265 3270 3275 3280 

Met Thr Asp Thr Thr Tyr Ser Cys Asp Met Leu Asp He Phe Gly Asp 

3285 3290 3295 

Glu He Tyr His Val Glu Arg Leu Asp Ala Phe Asn Gly Met Val He 

3300 3305 3310 

Ser Leu Asp Gly Pro Leu Asp Gly Thr Val Ser Val Ser Leu Arg Asp 

3315 3320 3325 

Asn His Gly He Pro Leu Ala Gin His Arg Leu His Lys Tyr Glu He 

3330 3335 3340 

Leu He Leu Asp Ala Ala Glu Asn Arg Pro Leu Ser Val Ser Thr Asp 
3345 3350 3355 3360 

Pro Lys Pro Val Glu Asp Pro Ser Pro Val Gin His He Glu Ser Leu 
3365 3370 3375 
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Gin Met Asp Pro Glu Pro Val Glu Ser Glu Pro Leu Pro Met Asp Ser 

3380 3385 3390 

Glu Pro Val Glu Asp Leu Glu Pro Val Gin His Leu Glu Ser Leu Pro 

3395 3400 3405 

Met Asp Pro Glu Pro Val Glu Asp Leu Glu Pro Val Gin His Leu Glu 

3410 3415 3420 

Pro Val Gin Gly Ser Pro Pro Val Gin Gly Gly Pro Glu Ser Val Glu 
3425 3430 3435 3440 

Ser Gly He Ala Tyr Thr Leu Trp Gin Phe Leu Ser Gly Leu Leu Asp 

3445 3450 3455 

Ala Leu Gly Leu Ala Asp Pro Asp Val Gly Ser Val Gin Lys Thr Ser 
3460 3465 3470 

<210> 5 
<211> 819 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (810) 

<400> 5 

atg cat ggg ate gag ggc ggc egg gga gat atg teg gag aat ttt gtg 48 
Met His Gly He Glu Gly Gly Arg Gly Asp Met Ser Glu Asn Phe Val 
1 5 10 15 

gcg ttt tgc gtg gcg tgc gec agg gga gtc aca aag gac gag atg aag 96 
Ala Phe Cys Val Ala Cys Ala Arg Gly Val Thr Lys Asp Glu Met Lys 
20 25 30 

tat gta gac ggg agg gtc ttc cac aaa gag tgc cat gca agg cac ggc 144 
Tyr Val Asp Gly Arg Val Phe His Lys Glu Cys His Ala Arg His Gly 
35 40 45 

ggg cag ate cgc ttc ccc aac cca gag gtc gag cag cgc gtg gec gag 192 
Gly Gin He Arg Phe Pro Asn Pro Glu Val Glu Gin Arg Val Ala Glu 
50 55 60 

ctg aag gtg gac ctg ata cag atg aga aac cag ctg gec gag atg aac 24 0 

Leu Lys Val Asp Leu He Gin Met Arg Asn Gin Leu Ala Glu Met Asn 
65 70 75 80 

agg gcg teg ggg gac gga ggg gtg cat tec age gee acc tct gcg gec 288 
Arg Ala Ser Gly Asp Gly Gly Val His Ser Ser Ala Thr Ser Ala Ala 
85 90 95 

gag gec gag cag cac agg gec gag eta aag gta cag ctg gtg cag atg 336 
Glu Ala Glu Gin His Arg Ala Glu Leu Lys Val Gin Leu Val Gin Met 
100 105 110 

aga aac cag ctg gee gag atg aac agg aag gee ccc gga aag ccg gca 384 
Arg Asn Gin Leu Ala Glu Met Asn Arg Lys Ala Pro Gly Lys Pro Ala 
115 120 125 



egg aaa aag gee gca ggc aag act gca egg aga aag age ggc aag aag 
Arg Lys Lys Ala Ala Gly Lys Thr Ala Arg Arg Lys Ser Gly Lys Lys 
130 135 140 
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acg gtg cgc agg aag acc ggc aag agg act gcc ggt aag aag gcc ggg 480 
Thr Val Arg Arg Lys Thr Gly Lys Arg Thr Ala Gly Lys Lys Ala Gly 
145 150 155 160 

gcg egg agg aag act acg gtc aag agg acg gcg egg agg aag acc acg 528 
Ala Arg Arg Lys Thr Thr Val Lys Arg Thr Ala Arg Arg Lys Thr Thr 
165 170 175 

gca aag aag gca gcc ggc aga aag gcc ggg gcg cgc aga aag gcc aca 576 
Ala Lys Lys Ala Ala Gly Arg Lys Ala Gly Ala Arg Arg Lys Ala Thr 
180 185 190 

gtc aag agg acg gtg cac aaa aag att gga gtg egg agg aag act acg 624 
Val Lys Arg Thr Val His Lys Lys lie Gly Val Arg Arg Lys Thr Thr 
195 200 205 

gca agg agg acg gcc ggt aag agt acg gtg cgc agg aag age aca gtc 672 
Ala Arg Arg Thr Ala Gly Lys Ser Thr Val Arg Arg Lys Ser Thr Val 
210 215 220 

aag agg acg gtg cac agg aag acc ggc aag aag gca gta gta cgc agg 72 0 

Lys Arg Thr Val His Arg Lys Thr Gly Lys Lys Ala Val Val Arg Arg 
225 230 235 240 

aag age aca gtc aag agg acg gca egg agg ccg gcc ggc aga aag acc 768 
Lys Ser Thr Val Lys Arg Thr Ala Arg Arg Pro Ala Gly Arg Lys Thr 
245 250 255 

ccc gga agg gcc gcg cgc agg gcc ggc gca aag agg cgc tag 810 
Pro Gly Arg Ala Ala Arg Arg Ala Gly Ala Lys Arg Arg * 
260 265 

ectgetgat 819 

<210> 6 
<211> 269 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 6 

Met His Gly lie Glu Gly Gly Arg Gly Asp Met Ser Glu Asn Phe Val 

15 10 15 

Ala Phe Cys Val Ala Cys Ala Arg Gly Val Thr Lys Asp Glu Met Lys 

20 25 30 

Tyr Val Asp Gly Arg Val Phe His Lys Glu Cys His Ala Arg His Gly 

35 40 45 

Gly Gin He Arg Phe Pro Asn Pro Glu Val Glu Gin Arg Val Ala Glu 

50 55 60 

Leu Lys Val Asp Leu He Gin Met Arg Asn Gin Leu Ala Glu Met Asn 
65 70 75 80 

Arg Ala Ser Gly Asp Gly Gly Val His Ser Ser Ala Thr Ser Ala Ala 

85 90 95 

Glu Ala Glu Gin His Arg Ala Glu Leu Lys Val Gin Leu Val GXn Met 

100 105 110 

Arg Asn Gin Leu Ala Glu Met Asn Arg Lys Ala Pro Gly Lys Pro Ala 

115 120 125 

Arg Lys Lys Ala Ala Gly Lys Thr Ala Arg Arg Lys Ser Gly Lys Lys 
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130 135 140 

Thr Val Arg Arg Lys Thr Gly Lys Arg Thr Ala Gly Lys Lys Ala Gly 
145 150 155 160 

Ala Arg Arg Lys Thr Thr Val Lys Arg Thr Ala Arg Arg Lys Thr Thr 

165 170 175 

Ala Lys Lys Ala Ala Gly Arg Lys Ala Gly Ala Arg Arg Lys Ala Thr 

180 185 190 

Val Lys Arg Thr Val His Lys Lys lie Gly Val Arg Arg Lys Thr Thr 

195 200 205 

Ala Arg Arg Thr Ala Gly Lys Ser Thr Val Arg Arg Lys Ser Thr Val 

210 215 220 

Lys Arg Thr Val His Arg Lys Thr Gly Lys Lys Ala Val Val Arg Arg 
225 230 235 240 

Lys Ser Thr Val Lys Arg Thr Ala Arg Arg Pro Ala Gly Arg Lys Thr 

245 250 255 

Pro Gly Arg Ala Ala Arg Arg Ala Gly Ala Lys Arg Arg 
260 265 

<210> 7 

<211> 1569 

<212> DNA 

<2X3> Cenarchaeum symbiosum 

<220> 
<221> CDS 

<222> (1) . . . (1569) 
<400> 7 

atg cag teg ctt gga egg eta gac gag gcg tgc gcg gag ata teg cgc 48 
Met Gin Ser Leu Gly Arg Leu Asp Glu Ala Cys Ala Glu lie Ser Arg 
15 10 15 

age ctg ctt gaa tac gag tec ccc ace gee ggt gat gtc egg acg gag 96 
Ser Leu Leu Glu Tyr Glu Ser Pro Thr Ala Gly Asp Val Arg Thr Glu 
20 25 30 

ate aga agg gca tgc aca aag tac teg etc egg agg ate cca aag aac 144 
lie Arg Arg Ala Cys Thr Lys Tyr Ser Leu Arg Arg lie Pro Lys Asn 
35 40 45 

cgc gag ata ctg gec ace gee agg ggt cag gac ttt gac agg ctg cgc 192 
Arg Glu lie Leu Ala Thr Ala Arg Gly Gin Asp Phe Asp Arg Leu Arg 
50 55 60 

ccc ctg ctg etc aaa aag ccc gta aag ace gca tec ggg gtg gec gtg 240 
Pro Leu Leu Leu Lys Lys Pro Val Lys Thr Ala Ser Gly Val Ala Val 
65 70 75 80 

ata gca gtc atg ccc atg ccg tac gcg tgc ccc cac ggc aga tgc aca 288 
lie Ala Val Met Pro Met Pro Tyr Ala Cys Pro His Gly Arg Cys Thr 
85 90 95 

tac tgc ccc ggc ggg gag gcg teg aac aca ccc aac age tat acc ggc 336 
Tyr Cys Pro Gly Gly Glu Ala Ser Asn Thr Pro Asn Ser Tyr Thr Gly 
100 105 110 



ggc gag ccc ata gcg gcg ggc gee 
Gly Glu Pro He Ala Ala Gly Ala 



atg aac age ggg tac gac ccg gaa 
Met Asn Ser Gly Tyr Asp Pro Glu 
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gag cag gtc cgc gcg ggt ctg gcc egg ctg cgc gcg cac ggc cac gat 432 
Glu Gin Val Arg Ala Gly Leu Ala Arg Leu Arg Ala His Gly His Asp 
130 135 140 

gta gcc aag ctg gag ata gta ata gtg ggc ggc aca ttc ctg ttc atg 480 
Val Ala Lys Leu Glu lie Val He Val Gly Gly Thr Phe Leu Phe Met 
145 150 155 160 

ccg cag gag tac cag gag tgg ttc gtc aag tec tgt tat gac gcg etc 528 
Pro Gin Glu Tyr Gin Glu Trp Phe Val Lys Ser Cys Tyr Asp Ala Leu 
165 170 175 

aac ggg tec get tec gcg ggg atg gag gag gcc aag cac cga aat gaa 576 
Asn Gly Ser Ala Ser Ala Gly Met Glu Glu Ala Lys His Arg Asn Glu 
180 185 190 

act gcc gtg cac aga aac gtg ggc etc acc ata gag acc aag ccg gac 624 
Thr Ala Val His Arg Asn Val Gly Leu Thr He Glu Thr Lys Pro Asp 
195 200 205 

tat tgc agg aca gag cat gtg gac gcg atg etc ggc ttt ggg gcc acg 672 
Tyr Cys Arg Thr Glu His Val Asp Ala Met Leu Gly Phe Gly Ala Thr 
210 215 220 

cgc gtg gag ata ggc gtg cag age etc egg gag gag gtc tac ttg agg 720 
Arg Val Glu He Gly Val Gin Ser Leu Arg Glu Glu Val Tyr Leu Arg 
225 230 235 240 



gtc aac egg ggg cac ggc tac cag gat gtg aca gag teg ttt gcc gcc 
Val Asn Arg Gly His Gly Tyr Gin Asp Val Thr Glu Ser Phe Ala Ala 
245 250 255 



768 



gcc agg gat gca ggc tac aag gtg get gcc cac atg atg cca gga etc 816 
Ala Arg Asp Ala Gly Tyr Lys Val Ala Ala His Met Met Pro Gly Leu 
260 265 270 

ccg ggg gcc acc ccg gaa ggc gac ate gag gat ctg cgc atg ctg ttt 864 
Pro Gly Ala Thr Pro Glu Gly Asp He Glu Asp Leu Arg Met Leu Phe 
275 280 285 

gag gat ccc gcg etc agg ccg gac atg etc aag gtg tac ccc gcg eta 912 
Glu Asp Pro Ala Leu Arg Pro Asp Met Leu Lys Val Tyr Pro Ala Leu 
290 295 300 

gta gta agg ggc acc ccc atg tat gag gag tat teg agg ggc gag tat 960 
Val Val Arg Gly Thr Pro Met Tyr Glu Glu Tyr Ser Arg Gly Glu Tyr 
305 310 315 320 

tec ccg tat acg gaa gag gag gtc ate egg gtg etc tec gag gcc aag 1008 
Ser Pro Tyr Thr Glu Glu Glu Val He Arg Val Leu Ser Glu Ala Lys 
325 330 335 



gcg cgc gtg ccc agg tgg gcg agg ata atg cgc gtg cag cgc gag ata 
Ala Arg Val Pro Arg Trp Ala Arg He Met Arg Val Gin Arg Glu He 
340 345 350 
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cac ccc gac gag ata gtg gcc ggg ccg agg age ggc aac etc cgc cag 1104 
His Pro Asp Glu lie Val Ala Gly Pro Arg Ser Gly Asn Leu Arg Gin 
355 360 365 

ctg gtg cac aag agg etc caa gag cag ggc cgc cga tgc cgc tgc ata 1152 
Leu Val His Lys Arg Leu Gin Glu Gin Gly Arg Arg Cys Arg Cys He 
370 375 380 

egg tgc agg gag gcg ggg etc gcg ggg agg acc gtg ccg cag aag etc 1200 
Arg Cys Arg Glu Ala Gly Leu Ala Gly Arg Thr Val Pro Gin Lys Leu 
385 390 395 400 

cgt att gac agg gcg gac tat teg gcc teg ggg ggg aga gaa teg ttt 1248 
Arg He Asp Arg Ala Asp Tyr Ser Ala Ser Gly Gly Arg Glu Ser Phe 
405 410 415 

ate teg ctt gta gac ggg gat gat gcc ate tat ggc ttt gtg cgc ctg 1296 
He Ser Leu Val Asp Gly Asp Asp Ala He Tyr Gly Phe Val Arg Leu 
420 425 430 

cgc aag ccc tec gga gca gca cac agg ccg gag gtc aca ccg gaa tec 1344 
Arg Lys Pro Ser Gly Ala Ala His Arg Pro Glu Val Thr Pro Glu Ser 
435 440 445 

tgc ata ata cgc gag ctg cac gta tac ggc agg teg ctt ggc etc ggc 1392 
Cys He He Arg Glu Leu His Val Tyr Gly Arg Ser Leu Gly Leu Gly 
450 455 460 

gag agg ggc ggc ata cag cac teg ggt eta ggc aga agg etc gtc tea 1440 
Glu Arg Gly Gly He Gin His Ser Gly Leu Gly Arg Arg Leu Val Ser 
465 470 475 480 

gaa gca gag tct gcc gcc cgt gag ctt ggc gcg ggc agg etc ctt gtg 1488 
Glu Ala Glu Ser Ala Ala Arg Glu Leu Gly Ala Gly Arg Leu Leu Val 
485 490 495 

ata age gcc gtc ggg aca agg ggt tac tat cgc agg etc gga tat tea 1536 
lie Ser Ala Val Gly Thr Arg Gly Tyr Tyr Arg Arg Leu Gly Tyr Ser 
500 505 510 

cgc acg ggc ccc tac atg ggg aag gtg etc tga 1569 
Arg Thr Gly Pro Tyr Met Gly Lys Val Leu * 
515 520 

<210> 8 
<211> 522 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 8 

Met Gin Ser Leu Gly Arg Leu Asp Glu Ala Cys Ala Glu He Ser Arg 

15 10 15 

Ser Leu Leu Glu Tyr Glu Ser Pro Thr Ala Gly Asp Val Arg Thr Glu 

20 25 30 

He Arg Arg Ala Cys Thr Lys Tyr Ser Leu Arg Arg He Pro Lys Asn 

35 40 45 

Arg Glu He Leu Ala Thr Ala Arg Gly Gin Asp Phe Asp Arg Leu Arg 
50 55 60 
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Pro Leu Leu Leu Lys Lys Pro Val Lys Thr Ala Ser Gly Val Ala Val 
65 70 75 80 

lie Ala Val Met Pro Met Pro Tyr Ala Cys Pro His Gly Arg Cys Thr 

85 90 95 

Tyr Cys Pro Gly Gly Glu Ala Ser Asn Thr Pro Asn Ser Tyr Thr Gly 

100 105 110 

Gly Glu Pro He Ala Ala Gly Ala Met Asn Ser Gly Tyr Asp Pro Glu 

115 120 125 

Glu Gin Val Arg Ala Gly Leu Ala Arg Leu Arg Ala His Gly His Asp 

130 135 . 140 

Val Ala Lys Leu Glu He Val He Val Gly Gly Thr Phe Leu Phe Met 
145 150 155 160 

Pro Gin Glu Tyr Gin Glu Trp Phe Val Lys Ser Cys Tyr Asp Ala Leu 

165 170 175 

Asn Gly Ser Ala Ser Ala Gly Met Glu Glu Ala Lys His Arg Asn Glu 

180 185 190 

Thr Ala Val His Arg Asn Val Gly Leu Thr He Glu Thr Lys Pro Asp 

195 200 205 

Tyr Cys Arg Thr Glu His Val Asp Ala Met Leu Gly Phe Gly Ala Thr 

210 215 220 

Arg Val Glu He Gly Val Gin Ser Leu Arg Glu Glu Val Tyr Leu Arg 
225 230 235 240 

Val Asn Arg Gly His Gly Tyr Gin Asp Val Thr Glu Ser Phe Ala Ala 

245 250 255 

Ala Arg Asp Ala Gly Tyr Lys Val Ala Ala His Met Met Pro Gly Leu 

260 265 270 

Pro Gly Ala Thr Pro Glu Gly Asp He Glu Asp Leu Arg Met Leu Phe 

275 280 285 

Glu Asp Pro Ala Leu Arg Pro Asp Met Leu Lys Val Tyr Pro Ala Leu 

290 295 300 

Val Val Arg Gly Thr Pro Met Tyr Glu Glu Tyr Ser Arg Gly Glu Tyr 
305 310 315 320 

Ser Pro Tyr Thr Glu Glu Glu Val He Arg Val Leu Ser Glu Ala Lys 

325 330 335 

Ala Arg Val Pro Arg Trp Ala Arg He Met Arg Val Gin Arg Glu He 

340 345 350 

His Pro Asp Glu He Val Ala Gly Pro Arg Ser Gly Asn Leu Arg Gin 

355 360 365 

Leu Val His Lys Arg Leu Gin Glu Gin Gly Arg Arg Cys Arg Cys He 

370 375 380 

Arg Cys Arg Glu Ala Gly Leu Ala Gly Arg Thr Val Pro Gin Lys Leu 
385 390 395 400 

Arg He Asp Arg Ala Asp Tyr Ser Ala Ser Gly Gly Arg Glu Ser Phe 

405 410 415 

He Ser Leu Val Asp Gly Asp Asp Ala He Tyr Gly Phe Val Arg Leu 

420 425 430 

Arg Lys Pro Ser Gly Ala Ala His Arg Pro Glu Val Thr Pro Glu Ser 

435 440 445 

Cys He He Arg Glu Leu His Val Tyr Gly Arg Ser Leu Gly Leu Gly 

450 455 460 

Glu Arg Gly Gly He Gin His Ser Gly Leu Gly Arg Arg Leu Val Ser 
465 470 475 480 

Glu Ala Glu Ser Ala Ala Arg Glu Leu Gly Ala Gly Arg Leu Leu Val 

485 490 495 

He Ser Ala Val Gly Thr Arg Gly Tyr Tyr Arg Arg Leu Gly Tyr Ser 

500 505 510 

Arg Thr Gly Pro Tyr Met Gly Lys Val Leu 
515 520 
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<210> 9 
<211> 1575 
<212> DMA 

<213> Cenarchaeum synibiosum 

<220> 
<221> CDS 

<222> (1) . . . (1575) 
<400> 9 

atg gag acg ata ggc cgc ggc acc tgg ata gac aag ctg gcg cat gaa 48 
Met Glu Thr He Gly Arg Gly Thr Trp lie Asp Lys Leu Ala His Glu 
15 10 15 

ctg gta gag cgc gaa gag gcc etc ggc egg gat aca gag atg ata aac 96 
Leu Val Glu Arg Glu Glu Ala Leu Gly Arg Asp Thr Glu Met lie Asn 
20 25 30 

gtc gag age ggc ctt ggc gcg tec ggg ata ccc cac atg ggg age etc 144 
Val Glu Ser Gly Leu Gly Ala Ser Gly lie Pro His Met Gly Ser Leu 
35 40 45 

ggg gat gca gtc agg gcg tac ggc gtg ggg etc gcc gtc ggc gac atg 192 
Gly Asp Ala Val Arg Ala Tyr Gly Val Gly Leu Ala- Val Gly Asp Met 
50 55 60 

ggg cac age ttc egg etc ata gcg tac ttt gac gac etc gac ggg etc 240 
Gly His Ser Phe Arg Leu lie Ala Tyr Phe Asp Asp Leu Asp Gly Leu 
65 70 75 80 

cgc aag gtc ccc gag ggc atg cca tec teg eta gaa gag cac ata gcc 288 
Arg Lys Val Pro Glu Gly Met Pro Ser Ser Leu Glu Glu His lie Ala 
85 90 95 

cgt ccc gtc teg gcg ata ccc gac ccc tac ggg tgc cac gat tec tac 336 
Arg Pro Val Ser Ala lie Pro Asp Pro Tyr Gly Cys His Asp Ser Tyr 
100 105 110 

ggc atg cac atg age ggc ctg ctg eta gag ggg etc gac gca ctg ggc 384 
Gly Met His Met Ser Gly Leu Leu Leu Glu Gly Leu Asp Ala Leu Gly 
115 120 125 

ata gag tat gac ttt agg egg gca agg gac acg tac cgc gac ggc ctg 432 
He Glu Tyr Asp Phe Arg Arg Ala Arg Asp Thr Tyr Arg Asp Gly Leu 
130 135 140 

etc gca gaa cag ate cac agg ata eta teg aac age teg gta ata ggg 480 
Leu Ala Glu Gin lie His Arg He Leu Ser Asn Ser Ser Val He Gly 
145 150 155 160 

gag aag ata gcc gag atg gtg ggc cag gaa aag ttt cgc age age ctg 528 
Glu Lys He Ala Glu Met Val Gly Gin Glu Lys Phe Arg Ser Ser Leu 
165 170 175 



ccg tac ttt gca gtc tgt gaa cag tgc ggg aag atg tac acg gcc gag 
Pro Tyr Phe Ala Val Cys Glu Gin Cys Gly Lys Met Tyr Thr Ala Glu 
180 185 190 
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tcc gtt gaa tac'ctg gca gac age cgc aag gtg egg tac agg tgc ggc 624 
Ser Val Glu Tyr Leu Ala Asp Ser Arg Lys Val Arg Tyr Arg Cys Gly 
195 200 205 

gac gec gag gta ggc gga aga aag ate gec ggc tgc ggg cac gag ggc 672 
Asp Ala Glu Val Gly Gly Arg Lys He Ala Gly Cys Gly His Glu Gly 
210 215 220 

gag gcg gac acg ggc gga gec ggc ggc aag etc gec tgg aag gtg gag 720 
Glu Ala Asp Thr Gly Gly Ala Gly Gly Lys Leu Ala Trp Lys Val Glu 
225 230 235 240 

ttt gee gca agg tgg cag gcg ttt gat gta cgc ttt gag gca tac ggc 768 
Phe Ala Ala Arg Trp Gin Ala Phe Asp Val Arg Phe Glu Ala Tyr Gly 
245 250 255 

aag gac ate atg gac tct gta agg ata aac gac tgg gtc tec gac gag 816 
Lys Asp He Met Asp Ser Val Arg He Asn Asp Trp Val Ser Asp Glu 
260 265 270 

ata eta tec age ccg cac ccc cac cat aca agg tac gag atg ttc etc 864 
He Leu Ser Ser Pro His Pro His His Thr Arg Tyr Glu Met Phe Leu 
275 2B0 285 

gac aag ggc ggc aaa aag ata tea aag teg tea gga aac gtg gtc acg 912 
Asp Lys Gly Gly Lys Lys He Ser Lys Ser Ser Gly Asn Val Val Thr 
290 295 300 

ccg cag aaa tgg etc agg tac ggc ace ccc cag teg ata ctg etc etc 960 
Pro Gin Lys Trp Leu Arg Tyr Gly Thr Pro Gin Ser He Leu Leu Leu 
305 310 315 . 320 

atg tac aag cgc ate acg ggg gcg egg gag ctt ggc etc gag gat gtg 100B 
Met Tyr Lys Arg He Thr Gly Ala Arg Glu Leu Gly Leu Glu Asp Val 
325 330 335 

cca tec ctg atg gac gag tac ggc gat ctt cag cgc gag tac ttt gcg 1056 
Pro Ser Leu Met Asp Glu Tyr Gly Asp Leu Gin Arg Glu Tyr Phe Ala 
340 345 350 

gga ggg ggc agg ggc ggg aaa gec cgc gag gec aag aac agg ggg eta 1104 
Gly Gly Gly Arg Gly Gly Lys Ala Arg Glu Ala Lys Asn Arg Gly Leu 
355 360 365 

ttc gag tat acg aac ctg ctg gag gca cag gag ggg ccg egg ccg cat 1152 
Phe Glu Tyr Thr Asn Leu Leu Glu Ala Gin Glu Gly Pro Arg Pro His 
370 375 380 

gcg ggc tac egg ctg eta gtc gag etc tec agg ctg ttc agg gag aat 1200 
Ala Gly Tyr Arg Leu Leu Val Glu Leu Ser Arg Leu Phe Arg Glu Asn 
385 390 395 400 

agg ace gag cgc gtc aca aaa aag etc gtc gag tac ggg gta att gac 124 8 

Arg Thr Glu Arg Val Thr Lys Lys Leu Val Glu Tyr Gly Val He Asp 
405 410 415 



ggg ccc teg ccc ggg ate gag egg etc ata gca ctg gec gga aac tat 
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Gly Pro Ser Pro Gly lie Glu Arg Leu He Ala Leu Ala Gly Asn Tyr 
420 425 430 

gca gac gac atg tat tct gcc gag aga aca gag gtg gag ctt gac ggg 1344 
Ala Asp Asp Met Tyr* Ser Ala Glu Arg Thr Glu Val Glu Leu Asp Gly 
435 440 445 

gcc aca agg ggg gcc etc teg gag ctg gca gaa atg etc ggt tec gcc 1392 
Ala Thr Arg Gly Ala Leu Ser Glu Leu Ala Glu Met Leu Gly Ser Ala 
450 455 460 

ccg gag ggc gga ctg cag gat gtc ata tac ggc gtg gcc aag tec cac 1440 
Pro Glu Gly Gly Leu Gin Asp Val He Tyr Gly Val Ala Lys Ser His 
465 470 475 480 

ggg gtg ccc ccg cgc gac ttt ttc aag gcg ctg tac agg ata ata ctg 1488 
Gly Val Pro Pro Arg Asp Phe Phe Lys Ala Leu Tyr Arg He He Leu 
485 490 495 

gat gca tec age ggg ccg agg ata ggc ccc ttc ata gag gac ata ggc 1536 
Asp Ala Ser Ser Gly Pro Arg He Gly Pro Phe He Glu Asp lie Gly 
500 505 510 

agg gag aag gtg gca ggt atg ata egg ggg cgc etc tga 1575 
Arg Glu Lys Val Ala Gly Met He Arg Gly Arg Leu * 
515 520 

<210> 10 
<211> 524 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 10 

Met Glu Thr He Gly Arg Gly Thr Trp He Asp Lys Leu Ala His Glu 

15 10 15 

Leu Val Glu Arg Glu Glu Ala Leu Gly Arg Asp Thr Glu Met He Asn 

20 25 30 

Val Glu Ser Gly Leu Gly Ala Ser Gly He Pro His Met Gly Ser Leu 

35 40 45 

Gly Asp Ala Val Arg Ala Tyr Gly Val Gly Leu Ala Val Gly Asp Met 

50 55 60 

Gly His Ser Phe Arg Leu He Ala Tyr Phe Asp Asp Leu Asp Gly Leu 
65 70 75 80 

Arg Lys Val Pro Glu Gly Met Pro Ser Ser Leu Glu Glu His He Ala 

85 90 95 

Arg Pro Val Ser Ala He Pro Asp Pro Tyr Gly Cys His Asp Ser Tyr 

100 105 110 

Gly Met His Met Ser Gly Leu Leu Leu Glu Gly Leu Asp Ala Leu Gly 

115 120 125 

lie Glu Tyr Asp Phe Arg Arg Ala Arg Asp Thr Tyr Arg Asp Gly Leu 

130 135 140 

Leu Ala Glu Gin lie His Arg lie Leu Ser Asn Ser Ser Val He Gly 
145 150 155 160 

Glu Lys He Ala Glu Met Val Gly Gin Glu Lys Phe Arg Ser Ser Leu 

165 170 175 

Pro Tyr Phe Ala Val Cys Glu Gin Cys Gly Lys Met Tyr Thr Ala Glu 

180 185 190 

Ser Val Glu Tyr Leu Ala Asp Ser Arg Lys Val Arg Tyr Arg Cys Gly 
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195 200 205 

Asp Ala Glu Val Gly Gly Arg Lys He Ala Gly Cys Gly His Glu Gly 

210 215 220 

Glu Ala Asp Thr Gly Gly Ala Gly Gly Lys Leu Ala Trp Lys Val Glu 
225 230 235 240 

Phe Ala Ala Arg Trp Gin Ala Phe Asp Val Arg Phe Glu Ala Tyr Gly 

245 250 255 

Lys Asp He Met Asp Ser Val Arg He Asn Asp Trp Val Ser Asp Glu 

260 265 270 

He Leu Ser Ser Pro His Pro His His Thr Arg Tyr Glu Met Phe Leu 

275 280 285 

Asp Lys Gly Gly Lys Lys He Ser Lys Ser Ser Gly Asn Val Val Thr 

290 295 300 

Pro Gin Lys Trp Leu Arg Tyr Gly Thr Pro Gin Ser He Leu Leu Leu 
305 310 315 320 

Met Tyr Lys Arg He Thr Gly Ala Arg Glu Leu Gly Leu Glu Asp Val 

325 330 335 

Pro Ser Leu Met Asp Glu Tyr Gly Asp Leu Gin Arg Glu Tyr Phe Ala 

340 345 350 

Gly Gly Gly Arg Gly Gly Lys Ala Arg Glu Ala Lys Asn Arg Gly Leu 

355 360 365 

Phe Glu Tyr Thr Asn Leu Leu Glu Ala Gin Glu Gly Pro Arg Pro His 

370 375 380 

Ala Gly Tyr Arg Leu Leu Val Glu Leu Ser Arg Leu Phe Arg Glu Asn 
385 390 395 400 

Arg Thr Glu Arg Val Thr Lys Lys Leu Val Glu Tyr Gly Val He Asp 

405 410 415 

Gly Pro Ser Pro Gly He Glu Arg Leu He Ala Leu Ala Gly Asn Tyr 

420 425 430 

Ala Asp Asp Met Tyr Ser Ala Glu Arg Thr Glu Val Glu Leu Asp Gly 

435 440 445 

Ala Thr Arg Gly Ala Leu Ser Glu Leu Ala Glu Met Leu Gly Ser Ala 

450 455 460 

Pro Glu Gly Gly Leu Gin Asp Val He Tyr Gly Val Ala Lys Ser His 
465 470 475 480 

Gly Val Pro Pro Arg Asp Phe Phe Lys Ala Leu Tyr Arg He He Leu 

485 490 495 

Asp Ala Ser Ser Gly Pro Arg He Gly Pro Phe He Glu Asp He Gly 

500 505 510 

Arg Glu Lys Val Ala Gly Met He Arg Gly Arg Leu 
515 520 

<210> 11 
<211> 885 
<212> DNA 

<213> Cenarchaeum sybiosum 

<220> 

<221> CDS 

<222> {!)... (885) 

<400> 11 

atg gag tea gec ggt gag cag gca cct ggt gtg gta ctt cac gac tat 4 8 

Met Glu Ser Ala Gly Glu Gin Ala Pro Gly Val Val Leu His Asp Tyr 
15 10 is 



ctt tea aaa ttg 
Leu Ser Lys Leu 



caa cag tat teg ggg agg gac aca att eta tat gcg 
Gin Gin Tyr Ser Gly Arg Asp Thr He Leu Tyr Ala 
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20 25 30 

acc aac tgg atg acg gac gaa ccg cat acg cct aat gaa get etc ata 144 
Thr Asn Trp Met Thr Asp Glu Pro His Thr Pro Asn Glu Ala Leu lie 
35 40 45 

aca aat ggt gac ctg tat gga ttt atg agg atg atg cgt gat tta aag 192 
Thr Asn Gly Asp Leu Tyr Gly Phe Met Arg Met Met Arg Asp Leu Lys 
50 55 60 

act aaa aaa ttg gat ctg ata etc cac agt cct gga ggt tct gee gag 240 
Thr Lys Lys Leu Asp Leu lie Leu His Ser Pro Gly Gly Ser Ala Glu 
65 70 75 80 

tct gca gaa teg att gtc aca tac ctt cat gcg aaa tat gat gat att 288 
Ser Ala Glu Ser lie Val Thr Tyr Leu His Ala Lys Tyr Asp Asp lie 
85 90 95 

egg gtc ate ata ccg tat gee gca atg tea gca gee teg atg ctt get 336 
Arg Val lie lie Pro Tyr Ala Ala Met Ser Ala Ala Ser Met Leu Ala 
100 105 no 

tgc gca tea aat tec ctg gta atg ggc aaa cac teg tct ata gga ccc 384 
Cys Ala Ser Asn Ser Leu Val Met Gly Lys His Ser Ser lie Gly Pro 
115 120 125 

get gat ccc caa ttt att ttc cca acc aag att ggc atg caa ata atg 432 
Ala Asp Pro Gin Phe lie Phe Pro Thr Lys lie Gly Met Gin lie Met 
130 135 140 

tct gca cag ctt eta att gac gag ttg caa gaa gtg cag gtg gta tct 48 0 

Ser Ala Gin Leu Leu lie Asp Glu Leu Gin Glu Val Gin Val Val Ser 
145 150 155 160 

gaa aaa cat ccg ggc agg ctt ggc gca tgg ctt cca ttg tta gga caa 528 
Glu Lys His Pro Gly Arg Leu Gly Ala Trp Leu Pro Leu Leu Gly Gin 
165 170 175 

tat cct cct gga ctg gtt caa aaa tgc att age age cag aaa eta get 576 
Tyr Pro Pro Gly Leu Val Gin Lys Cys He Ser Ser Gin Lys Leu Ala 
180 185 190 

gaa gtg ctt gta caa aaa tgg ctg gaa gac cac atg ttt get ggc gag 624 
Glu Val Leu Val Gin Lys Trp Leu Glu Asp His Met Phe Ala Gly Glu 
195 200 205 

tct gat gcg gca gaa aaa tea aaa aaa ata tct gga atg tta get tct 672 
Ser Asp Ala Ala Glu Lys Ser Lys Lys He Ser Gly Met Leu Ala Ser 
210 215 220 

cct gga aaa tat tac agt cat ggg aga tac ata teg cga gag gag tgt 720 
Pro Gly Lys Tyr Tyr Ser His Gly Arg Tyr He Ser Arg Glu Glu Cys 
225 230 235 240 



a 99 99 c atc ggt ttg aaa ata act gat eta gaa gee gac caa gaa ttt 
Arg Gly He Gly Leu Lys He Thr Asp Leu Glu Ala Asp Gin Glu Phe 
245 250 255 
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cag gat ctg aca ttg teg gta tct cat gca gcg gat ate ctg tct caa 816 
Gin Asp Leu Thr Leu Ser Val Ser His Ala Ala Asp lie Leu Ser Gin 
260 265 270 

ttt act cca ate aac aaa ate ate gcg aat cac etc ggt aat tea gtt 864 
Phe Thr Pro He Asn Lys He He Ala Asn His Leu Gly Asn Ser Val 
275 280 285 

ate age aaa cca tea aca tag 885 
He Ser Lys Pro Ser Thr * 
290 

<210> 12 
<211> 294 
<212> PRT 

<213> Cenarchaeum sybiosum 
<400> 12 

Met Glu Ser Ala Gly Glu Gin Ala Pro Gly Val Val Leu His Asp Tyr 

15 10 15 

Leu Ser Lys Leu Gin Gin Tyr Ser Gly Arg Asp Thr He Leu Tyr Ala 

20 25 30 

Thr Asn Trp Met Thr Asp Glu Pro His Thr Pro Asn Glu Ala Leu He 

35 40 45 

Thr Asn Gly Asp Leu Tyr Gly Phe Met Arg Met Met Arg Asp Leu Lys 

50 55 60 

Thr Lys Lys Leu Asp Leu He Leu His Ser Pro Gly Gly Ser Ala Glu 
65 70 75 80 

Ser Ala Glu Ser He Val Thr Tyr Leu His Ala Lys Tyr Asp Asp He 

85 90 95 

Arg Val He He Pro Tyr Ala Ala Met Ser Ala Ala Ser Met Leu Ala 

100 105 110 

Cys Ala Ser Asn Ser Leu Val Met Gly Lys His Ser Ser He Gly Pro 

115 120 125 

Ala Asp Pro Gin Phe He Phe Pro Thr Lys He Gly Met Gin* He Met 

130 135 140 

Ser Ala Gin Leu Leu He Asp Glu Leu Gin Glu Val Gin Val Val Ser 
145 150 155 160 

Glu Lys His Pro Gly Arg Leu Gly Ala Trp Leu Pro Leu Leu Gly Gin 

165 170 175 

Tyr Pro Pro Gly Leu Val Gin Lys Cys He Ser Ser Gin Lys Leu Ala 

180 185 190 

Glu Val Leu Val Gin Lys Trp Leu Glu Asp His Met Phe Ala Gly Glu 

195 200 205 

Ser Asp Ala Ala Glu Lys Ser Lys Lys He Ser Gly Met Leu Ala Ser 

210 215 220 

Pro Gly Lys Tyr Tyr Ser His Gly Arg Tyr He Ser Arg Glu Glu Cys 
225 230 235 240 

Arg Gly He Gly Leu Lys He Thr Asp Leu Glu Ala Asp Gin Glu Phe 

245 250 255 

Gin Asp Leu Thr Leu Ser Val Ser His Ala Ala Asp He Leu Ser Gin 

260 265 270 

Phe Thr Pro He Asn Lys He He Ala Asn His Leu Gly Asn Ser Val 

275 280 285 

lie Ser Lys Pro Ser Thr 
2 90 



<210> 13 
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<211> 1305 
<212> DNA 

<213> Cenarchaem symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (1305) 

<400> 13 

gtg gat eta gag cgc gag tac agg gca aag acc agg ggc teg gcg ggg 48 
Met Asp Leu Glu Arg Glu Tyr Arg Ala Lys Thr Arg Gly Ser Ala Gly 
15 10 15 

ata ttt gec egg teg aga agg tac cat gta ggg ggg gtc age cac aac 96 
lie Phe Ala Arg Ser Arg Arg Tyr His Val Gly Gly Val Ser His Asn 
20 25 30 

ata agg tac tat gag ccg tac ccg ttt gtt aca agg teg gcg cgc ggc 144 
lie Arg Tyr Tyr Glu Pro Tyr Pro Phe Val Thr Arg Ser Ala Arg Gly 
35 40 45 

aag cac ctt gtg gac gtc gac ggg aac aag tat acc gac tat tgg atg 192 
Lys His Leu Val Asp Val Asp Gly Asn Lys Tyr Thr Asp Tyr Trp Met 
50 55 60 

ggg cac tgg age ctg ata etc ggc cac gcg ccg gcg caa gta agg teg 240 
Gly His Trp Ser Leu He Leu Gly His Ala Pro Ala Gin Val Arg Ser 
65 70 75 80 

gca gtg gag ggg cag ctg cgc cgc ggc tgg ata cac ggg acc gca aac 288 . 

Ala Val Glu Gly Gin Leu Arg Arg Gly Trp He His Gly Thr Ala Asn 
85 90 95 

gag ccc acc atg egg etc teg gag ate ata cgc ggg gcg gta aag gcg 336 
Glu Pro Thr Met Arg Leu Ser Glu He He Arg Gly Ala Val Lys Ala 
100 105 110 

gca gag aag ata agg tat gtt aca tec ggc acg gag gec gtc atg tat 384 
Ala Glu Lys He Arg Tyr Val Thr Ser Gly Thr Glu Ala Val Met Tyr 
115 120 125 

gcg gca agg atg gcg cgc gca cgc acg gga aaa aaa gtg ata gca aag 432 
Ala Ala Arg Met Ala Arg Ala Arg Thr Gly Lys Lys Val He Ala Lys 
130 135 140 

gtc gac ggc ggc tgg cac gga tac gcg teg ggg ctg eta aag teg gtc 480 
Val Asp Gly Gly Trp His Gly Tyr Ala Ser Gly Leu Leu Lys Ser Val 
145 150 155 160 

aac tgg ccg tac gat gtg ccc gag age ggg ggg etc gtc gac gag gag 528 
Asn Trp Pro Tyr Asp Val Pro Glu Ser Gly Gly Leu Val Asp Glu Glu 
165 170 175 

cac acc gtg tec ate ccg tac aac aat ctg gag gga tec ctg gag gcg 576 
His Thr Val Ser He Pro Tyr Asn Asn Leu Glu Gly Ser Leu Glu Ala 
180 185 190 



eta agg cgc gca ggg ggc gac ctt gca tgt gtc ata gtc gag ccg atg 
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Leu Arg Arg Ala Gly Gly Asp Leu Ala Cys Val lie Val Glu Pro Met 
195 200 205 

ctt ggc ggc ggc ggc tgc ata ccg gca gaa ccg gac tat etc cgc ggc 672 
Leu Gly Gly Gly Gly Cys lie Pro Ala Glu Pro Asp Tyr Leu Arg Gly 
210 215 220 

ata cag gag ttt gtg cat teg aag ggt gca ctg ttc att etc gac gag 720 
lie Gin Glu Phe Val His Ser Lys Gly Ala Leu Phe lie Leu Asp Glu 
225 230 235 240 

ata gtc acg ggg ttc egg ttc gac ttt ggc tgc gcg tac aag aaa atg 768 
lie Val Thr Gly Phe Arg Phe Asp Phe Gly Cys Ala Tyr Lys Lys Met 
245 250 255 

ggg ctg gac ccc gac gtg gtg gcg ctg gga aag ata gtc ggg ggc gga 816 
Gly Leu Asp Pro Asp Val Val Ala Leu Gly Lys He Val Gly Gly Gly 
260 265 270 

ttc ccc ata ggt gtg gtg tgc ggc aag gac gag gtg atg tgc ate tec 864 
Phe Pro He Gly Val Val Cys Gly Lys Asp Glu Val Met Cys lie Ser 
275 280 285 

gat acc ggc gcg cat gca aga acc gag agg gcg tac att ggc ggc ggc 912 
Asp Thr Gly Ala His Ala Arg Thr Glu Arg Ala Tyr lie Gly Gly Gly 
290 295 300 

acc ttt tct gca aac ccc gcg acg atg act gcg ggt gec gcg gca etc 960 
Thr Phe Ser Ala Asn Pro Ala Thr Met Thr Ala Gly Ala Ala Ala Leu 
305 310 315 320 

ggt gca etc agg gag aga agg ggc aca eta tac ccc aga ata aac tec 1008 
Gly Ala Leu Arg Glu Arg Arg Gly Thr Leu Tyr Pro Arg He Asn Ser 
325 330 335 

at 9 999 gac gac gca agg gcg egg etc teg agg ata ttc gac ggc agg 1056 
Met Gly Asp Asp Ala Arg Ala Arg Leu Ser Arg He Phe Asp Gly Arg 
340 345 350 

gtt gca gtg acc ggc agg ggc teg ctg ttc atg acg cac ttt aca ccg 1104 
Val Ala Val Thr Gly Arg Gly Ser Leu Phe Met Thr His Phe Thr Pro 
355 360 365 

gat ggg gec cgc agg ata tec age gcg gca gat get gec gec tgc gat 1152 
Asp Gly Ala Arg Arg He Ser Ser Ala Ala Asp Ala Ala Ala Cys Asp 
370 375 380 

gtg cat ctg ctg cac agg tac cac ctg gac atg att aca agg gac ggc 1200 
Val His Leu Leu His Arg Tyr His Leu Asp Met He Thr Arg Asp Gly 
385 390 395 400 

ata ttc ttt ctg cca ggc aag ctg ggg gec ata tct gee gee cac tea 1248 
He Phe Phe Leu Pro Gly Lys Leu Gly Ala He Ser Ala Ala His Ser 
405 410 415 



agg gcg gac ctt ggg gee atg tat teg gcg tct gag cgc ttt gcg ggg 
Arg Ala Asp Leu Gly Ala Met Tyr Ser Ala Ser Glu Arg Phe Ala Gly 
420 425 430 
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gga ctg tga 1305 
Gly Leu * 

<210> 14 
<211> 434 
<212> PRT 

<213> Cenarchaem symbiosura 
<400> 14 

Met Asp Leu Glu Arg Glu Tyr Arg Ala Lys Thr Arg Gly Ser Ala Gly 

15 10 15 

lie Phe Ala Arg Ser Arg Arg Tyr His Val Gly Gly Val Ser His Asn 

20 25 30 

lie Arg Tyr Tyr Glu Pro Tyr Pro Phe Val Thr Arg Ser Ala Arg Gly 

35 40 45 

Lys His Leu Val Asp Val Asp Gly Asn Lys Tyr Thr Asp Tyr Trp Met 

50 55 60 

Gly His Trp Ser Leu lie Leu Gly His Ala Pro Ala Gin Val Arg Ser 
65 70 75 80 

Ala Val Glu Gly Gin Leu Arg Arg Gly Trp He His Gly Thr Ala Asn 

85 90 95 

Glu Pro Thr Met Arg Leu Ser Glu He He Arg Gly Ala Val Lys Ala 

100 105 110 

Ala Glu Lys He Arg Tyr Val Thr Ser Gly Thr Glu Ala Val Met Tyr 

115 120 125 

Ala Ala Arg Met Ala Arg Ala Arg Thr Gly Lys Lys Val He Ala Lys 

130 135 140 

Val Asp Gly Gly Trp His Gly Tyr Ala Ser Gly Leu Leu Lys Ser Val 
145 150 155 160 

Asn Trp Pro Tyr Asp Val Pro Glu Ser Gly Gly Leu Val Asp Glu Glu 

165 170 175 

His Thr Val Ser He Pro Tyr Asn Asn Leu Glu Gly Ser Leu Glu Ala 

180 185 190 

Leu Arg Arg Ala Gly Gly Asp Leu Ala Cys Val He Val Glu Pro Met 

195 200 205 

Leu Gly Gly Gly Gly Cys He Pro Ala Glu Pro Asp Tyr Leu Arg Gly 

210 215 220 

He Gin Glu Phe Val His Ser Lys Gly Ala Leu Phe lie Leu Asp Glu 
225 230 235 240 

lie Val Thr Gly Phe Arg Phe Asp Phe Gly Cys Ala Tyr Lys Lys Met 

245 250 255 

Gly Leu Asp Pro Asp Val Val Ala Leu Gly Lys He Val Gly Gly Gly 

260 265 270 

Phe Pro He Gly Val Val Cys Gly Lys Asp Glu Val Met Cys lie Ser 

275 280 285 

Asp Thr Gly Ala His Ala Arg Thr Glu Arg Ala Tyr lie Gly Gly Gly 

290 295 300 

Thr Phe Ser Ala Asn Pro Ala Thr Met Thr Ala Gly Ala Ala Ala Leu 
305 310 315 320 

Gly Ala Leu Arg Glu Arg Arg Gly Thr Leu Tyr Pro Arg He Asn Ser 

325 330 335 

Met Gly Asp Asp Ala Arg Ala Arg Leu Ser Arg lie Phe Asp Gly Arg 

340 345 350 

Val Ala Val Thr Gly Arg Gly Ser Leu Phe Met Thr His Phe Thr Pro 

355 360 365 

Asp Gly Ala Arg Arg He Ser Ser Ala Ala Asp Ala Ala Ala Cys Asp 
370 375 380 
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Val His Leu Leu His Arg Tyr His Leu Asp Met lie Thr Arg Asp Gly 
385 390 395 400 

lie Phe Phe Leu Pro Gly Lys Leu Gly Ala lie Ser Ala Ala His Ser 

405 410 415 

Arg Ala Asp Leu Gly Ala Met Tyr Ser Ala Ser Glu Arg Phe Ala Gly 
420 425 430 

Gly Leu 

<210> 15 

<211> 816 

<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (816) 

<400> 15 

atg ata etc ttc ggc aag age gac ccc tec gac ctg etc cgc cag gee 48 

Met He Leu Phe Gly Lys Ser Asp Pro Ser Asp Leu Leu Arg Gin Ala 
15 10 15 

gat ctt ttg tgc agt ggg aac aag tac aag gcg gca gtg ggc ctg tac 96 
Asp Leu Leu Cys Ser Gly Asn Lys Tyr Lys Ala Ala Val Gly Leu Tyr 
20 25 30 

age agg ata etc aag gac gac ccg cag aac agg atg gtc ctg cag aga 144 
Ser Arg He Leu Lys Asp Asp Pro Gin Asn Arg Met Val Leu Gin Arg 
35 40 45 

aag ggc etc gee etc aac agg ata aga agg tac tct gat gee ata acg 192 
Lys Gly Leu Ala Leu Asn Arg He Arg Arg Tyr Ser Asp Ala He Thr 
50 55 60 

tgc ttt gat ctg ctg etc gag ctg gat gat ggc gac gcg cct gca tac 240 
Cys Phe Asp Leu Leu Leu Glu Leu Asp Asp Gly Asp Ala Pro Ala Tyr 
65 70 75 80 

aac aac aag gee ata gee cag gee gag ctg ggc gat acg gca tec gee 288 
Asn Asn Lys Ala He Ala Gin Ala Glu Leu Gly Asp Thr Ala Ser Ala 
85 90 95 

ctg gag aac tat ggc agg gee ate gaa gee age ccc agg tac gcg ccg 336 
Leu Glu Asn Tyr Gly Arg Ala^ He Glu Ala Ser Pro Arg Tyr Ala Pro 
100 105 HO 

gcg tac ttt aac agg gee gtc ctg etc gac agg etc ggc gag cac gaa 384 
Ala Tyr Phe Asn Arg Ala Val Leu Leu Asp Arg Leu Gly Glu His Glu 
115 120 125 

gac gcg ctg ccg gac etc gac aag gcg aca agg ctg gac agg gac aag 432 
Asp Ala Leu Pro Asp Leu Asp Lys Ala Thr Arg Leu Asp Arg Asp Lys 
130 135 140. 

gec aac ccg agg ttc tac aag ggg ata gtc ctg gga aag atg ggc egg 480 
Ala Asn Pro Arg Phe Tyr Lys Gly lie Val Leu Gly Lys Met Gly Arg 
145 150 155 160 
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cat gca gag gcg ctg tec tgc ttc aag gag gtg tgc agg gcg gac cac 528 
His Ala Glu Ala Leu Ser Cys Phe Lys Glu Val Cys Arg Ala Asp His 
165 170 175 

ggc cac gec gac tea cag ttc cac gtg gcg ata gag gta gec gag etc 576 
Gly His Ala Asp Ser Gin Phe His Val Ala lie Glu Val Ala Glu Leu 
180 185 190 

ggc aaa cac gec gaa gee etc ggt gag ctt gcg gca ctg ccc gca gag 624 
Gly Lys His Ala Glu Ala Leu Gly Glu Leu Ala Ala Leu Pro Ala Glu 
195 200 205 

tac cgc gag aac gca aac gtt etc tac gee egg gcg cgc age etc gec 672 
Tyr Arg Glu Asn Ala Asn Val Leu Tyr Ala Arg Ala Arg Ser Leu Ala 
210 215 220 

ggc ctg gac agg tac gac gag tec att gca cac ctg caa aag gec gee 720 
Gly Leu Asp Arg Tyr Asp Glu Ser lie Ala His Leu Gin Lys Ala Ala 
225 230 235 240 

aga aag gac tec aag aca ata aaa aag tgg gee cgc gee gag aag gee 768 
Arg Lys Asp Ser Lys Thr lie Lys Lys Trp Ala Arg Ala Glu Lys Ala 
245 250 255 

ttt gat cat ata egg gat gat ccc agg ttc aaa aag ata gee ggg taa 816 
Phe Asp His lie Arg Asp Asp Pro Arg Phe Lys Lys lie Ala Gly * 
260 265 270 

<210> 16 
<211> 271 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 16 

Met lie Leu Phe Gly Lys Ser Asp Pro Ser Asp Leu Leu Arg Gin Ala 

1 5 10 15 

Asp Leu Leu Cys Ser Gly Asn Lys Tyr Lys Ala Ala Val Gly Leu Tyr 

20 25 30 

Ser Arg lie Leu Lys Asp Asp Pro Gin Asn Arg Met Val Leu Gin Arg 

35 40 45 

Lys Gly Leu Ala Leu Asn Arg lie Arg Arg Tyr Ser Asp Ala lie Thr 

50 55 60 

Cys Phe Asp Leu Leu Leu Glu Leu Asp Asp Gly Asp Ala Pro Ala Tyr 
65 70 75 80 

Asn Asn Lys Ala lie Ala Gin Ala Glu Leu Gly Asp Thr Ala Ser Ala 

85 90 95 

Leu Glu Asn Tyr Gly Arg Ala lie Glu Ala Ser Pro Arg Tyr Ala Pro 

100 105 110 

Ala Tyr Phe Asn Arg Ala Val Leu Leu Asp Arg Leu Gly Glu His Glu 

115 120 125 

Asp Ala Leu Pro Asp Leu Asp Lys Ala Thr Arg Leu Asp Arg Asp Lys 

130 135 140 

Ala Asn Pro Arg Phe Tyr Lys Gly lie Val Leu Gly Lys Met Gly Arg 
145 150 155 160 

His Ala Glu Ala Leu Ser Cys Phe Lys Glu Val Cys Arg Ala Asp His 

165 170 175 

Gly His Ala Asp Ser Gin Phe His Val Ala lie Glu Val Ala Glu Leu 
180 185 190 
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Gly Lys His Ala Glu Ala Leu Gly Glu Leu Ala Ala Leu Pro Ala Glu 

195 200 205 

Tyr Arg Glu Asn Ala Asn Val Leu Tyr Ala Arg Ala Arg Ser Leu Ala 

210 215 220 

Gly Leu Asp Arg Tyr Asp Glu Ser He Ala His Leu Gin Lys Ala Ala 
225 230 235 240 

Arg Lys Asp Ser Lys Thr He Lys Lys Trp Ala Arg Ala Glu Lys Ala 

245 250 255 

Phe Asp His He Arg Asp Asp Pro Arg Phe Lys Lys He Ala Gly 
260 265 270 

<210> 17 
<211> 696 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (696) 

<400> 17 

gtg act gac aag aca agg ate ate gtc ctg cgc aac gec atg act gaa 48 

Met Thr Asp Lys Thr Arg He He Val Leu Arg Asn Ala Met Thr Glu 
15 10 is 

cag tec gec egg gee atg ate gag gca aaa aag acg ggg cca ttc agg 96 
Gin Ser Ala Arg Ala Met He Glu Ala Lys Lys Thr Gly Pro Phe Arg 
20 25 30 

gee atg atg agg gcg ccc cca aag gag gac gtc cat gta cat tec gta 144 
Ala Met Met Arg Ala Pro Pro Lys Glu Asp Val His Val His Ser Val 
35 40 45 

agg etc gtc cac gag gcg etc ate cgc gtc tec gec egg tac teg gee 192 
Arg Leu Val His Glu Ala Leu He Arg Val Ser Ala Arg Tyr Ser Ala 
50 55 60 

gac ttt ttc aga agg gec gtg cac ccg ate aag gtg gat cag aac gtg 24 0 

Asp Phe Phe Arg Arg Ala Val His Pro He Lys Val Asp Gin Asn Val 
65 70 75 80 

ate gag gtg gtg ctg ggc gac ggc gtc ttc ccg ata agg tea aag teg 288 
He Glu Val Val Leu Gly Asp Gly Val Phe Pro He Arg Ser Lys Ser 
85 90 95 

cgc ata cgc aag ace ctg tec gee ggg cgc ggc aag aac agg gtc gat 336 
Arg He Arg Lys Thr Leu Ser Ala Gly Arg Gly Lys Asn Arg Val Asp 
100 105 no 

ctg gaa etc gag gag cac gta tac gcg gaa tea gag ggc gtg atg tgc 384 
Leu Glu Leu Glu Glu His Val Tyr Ala Glu Ser Glu Gly Val Met Cys 
115 120 125 

ctt gac egg cac ggc ggg gag ace ggc ttt ccc tac aag acg ggg ace 432 
Leu Asp Arg His Gly Gly Glu Thr Gly Phe Pro Tyr Lys Thr Gly Thr 
130 135 140 



ggc gcg gtc gag ccg tac ccg egg cgc atg ctt gat teg teg gag aat 



480 



WO 00/18909 



-65- 



PCTAJS99/22752 



Gly Ala Val Glu Pro Tyr Pro Arg Arg Met Leu Asp Ser Ser Glu Asn 
145 150 155 160 

gtg egg cgc ccg gag ata gac acc ggg gtg gcg ctg gaa aaa etc egg 528 
Val Arg Arg Pro Glu lie Asp Thr Gly Val Ala Leu Glu Lys Leu Arg 
165 170 175 

gta aag etc cgc ggg ccc ccg cct gac ggc atg cgc gac etc egg gag 576 
Val Lys Leu Arg Gly Pro Pro Pro Asp Gly Met Arg Asp Leu Arg Glu 
180 185 190 

gag ttt gca gtc aga teg gtc gaa gaa gtg tat gee cct gtc tac gag 624 
Glu Phe Ala Val Arg Ser Val Glu Glu Val Tyr Ala Pro Val Tyr Glu 
195 200 205 

teg egg ctt gtg ggg ccc aaa aaa aag gtc egg ata atg egg ata gac 672 
Ser Arg Leu Val Gly Pro Lys Lys Lys Val Arg lie Met Arg lie Asp 
210 215 220 

gcg gca aga aaa aag atg ctg tag 696 
Ala Ala Arg Lys Lys Met Leu * 
225 230 

<210> 18 
<211> 231 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 18 

Met Thr Asp Lys Thr Arg lie lie Val Leu Arg Asn Ala Met Thr Glu 

15 10 15 

Gin Ser Ala Arg Ala Met lie Glu Ala Lys Lys Thr Gly Pro Phe Arg 

20 25 30 

Ala Met Met Arg Ala Pro Pro Lys Glu Asp Val His Val His Ser Val 

35 40 45 

Arg Leu Val His Glu Ala Leu lie Arg Val Ser Ala Arg Tyr Ser Ala 

50 55 60 

Asp Phe Phe Arg Arg Ala Val His Pro lie Lys Val Asp Gin Asn Val 
65 70 75 80 

He Glu Val Val Leu Gly Asp Gly Val Phe Pro He Arg Ser Lys Ser 

85 90 95 

Arg lie Arg Lys Thr Leu Ser Ala Gly Arg Gly Lys Asn Arg Val Asp 

100 105 110 

Leu Glu Leu Glu Glu His Val Tyr Ala Glu Ser Glu Gly Val Met Cys 

115 120 125 

Leu Asp Arg His Gly Gly Glu Thr Gly Phe Pro Tyr Lys Thr Gly Thr 

130 135 140 

Gly Ala Val Glu Pro Tyr Pro Arg Arg Met Leu Asp Ser Ser Glu Asn 
145 150 155 160 

Val Arg Arg Pro Glu He Asp Thr Gly Val Ala Leu Glu Lys Leu Arg 

165 170 175 

Val Lys Leu Arg Gly Pro Pro Pro Asp Gly Met Arg Asp Leu Arg Glu 

180 185 190 

Glu Phe Ala Val Arg Ser Val Glu Glu Val Tyr Ala Pro Val Tyr Glu 

195 200 205 

Ser Arg Leu Val Gly Pro Lys Lys Lys Val Arg He Met Arg He Asp 

210 215 220 

Ala Ala Arg Lys Lys Met Leu 
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225 230 

<210> 19 

<211> 378 

<2L2> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (378) 

<400> 19 

atg agg tea gaa gag agg ccg ggt cac att gaa aag ttc eta aag agg 48 

Met Arg Ser Glu Glu Arg Pro Gly His lie Glu Lys Phe Leu Lys Arg 
1 5 10 15 

gcg gac aag gcg ate gac age gcg gtc gag cag ggc gtc aag agg gee 96 
Ala Asp Lys Ala He Asp Ser Ala Val Glu Gin Gly Val Lys Arg Ala 
20 25 30 

gac gag ata eta gac gat gca gtc gag etc ggc aag att acg gtg ggc 144 
Asp Glu He Leu Asp Asp Ala Val Glu Leu Gly Lys He Thr Val Gly 
35 40 45 

gag gcg cag agg agg age gat gtg ctg etc aaa cag gee gag egg gag 192 
Glu Ala Gin Arg Arg Ser Asp Val Leu Leu Lys Gin Ala Glu Arg Glu 
50 55 60 

age agg egg etc aag tec aag ggc gee aaa aag etc gaa aag ggc ata 240 
Ser Arg Arg Leu Lys Ser Lys Gly Ala Lys Lys Leu Glu Lys Gly He 
65 70 75 80 

ggc gec gca aaa aag atg gca gca ggc aag ggc gac gcg etc gag acg 288 
Gly Ala Ala Lys Lys Met Ala Ala Gly Lys Gly Asp Ala Leu Glu Thr 
85 90 95 

etc gca aag etc ggc gag etc aga aag gcg ggg ate ata acg gag aaa 336 
Leu Ala Lys Leu Gly Glu Leu Arg Lys Ala Gly He He Thr Glu Lys 
100 105 110 

gag ttt cgc gec aaa aag aaa aag etc etc gca gag ate tga 378 
Glu Phe Arg Ala Lys Lys Lys Lys Leu Leu Ala Glu He * 
115 120 125 

<210> 20 
<211> 125 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 20 

Met Arg Ser Glu Glu Arg Pro Gly His He Glu Lys Phe Leu Lys Arg 

15 10 15 

Ala Asp Lys Ala He Asp Ser Ala Val Glu Gin Gly Val Lys Arg Ala 

20 25 30 

Asp Glu He Leu Asp Asp Ala Val Glu Leu Gly Lys He Thr Val Gly 

35 40 45 

Glu Ala Gin Arg Arg Ser Asp Val Leu Leu Lys Gin Ala Glu Arg Glu 
50 55 60 
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Ser Arg Arg Leu Lys 
65 

Gly Ala Ala Lys Lys 
85 

Leu Ala Lys Leu Gly 
100 

Glu Phe Arg Ala Lys 
115 

<210> 21 
<211> 600 
<212> DNA 
<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (600) 

<400> 21 

atg tec cag acg ggg gcc ccg ggc ggg cat gcc tgc acg cca tac acg ■ 48 

Met Ser Gin Thr Gly Ala Pro Gly Gly His Ala Cys Thr Pro Tyr Thr 
15 10 15 

cac gat cac gcc teg ate gag etc aag gac gcg tgg gcc teg teg agg 96 
His Asp His Ala Ser He Glu Leu Lys Asp Ala Trp Ala Ser Ser Arg 
20 25 30 

aac gtc cgc gag atg tac ttt gtg ace gcc acg ttc teg tec gag age 144 
Asn Val Arg Glu Met Tyr Phe Val Thr Ala Thr Phe Ser Ser Glu Ser 
35 40 45 

cag ccg tac ttt gca ccg cag gcc aac cac tac ctg ctg gca agg ttc 192 
Gin Pro Tyr Phe Ala Pro Gin Ala Asn His Tyr Leu Leu Ala Arg Phe 
50 55 60 



Ser Lys Gly Ala Lys Lys 
70 75 
Met Ala Ala Gly Lys Gly 
90 

Glu Leu Arg Lys Ala Gly 
105 

Lys Lys Lys Leu Leu Ala 
120 



Leu Glu Lys Gly He 
80 

Asp Ala Leu Glu Thr 
95 

He He Thr Glu Lys 
110 

Glu He 
125 



aag gac gcc ccc aga atg ate aag gcg gtg ggc egg ggg gag ggc gca 240 

Lys Asp Ala Pro Arg Met He Lys Ala Val Gly Arg Gly Glu Gly Ala 
65 70 75 80 

tec tat gtg ttt age atg gac gag gac ata ttc gag agg gag tec ccc 2 88 

Ser Tyr Val Phe Ser Met Asp Glu Asp He Phe Glu Arg Glu Ser Pro 
85 90 95 

999 gtg age tat gta teg gtg tac tat ctg gag tac ggc gat tec gag 336 

Gly Val Ser Tyr Val Ser Val Tyr Tyr Leu Glu Tyr Gly Asp Ser Glu 
100 105 110 

gag gac ata tgc gag gtg gcg tec gtg gtg ggg aga aag gag aag ata 384 

Glu Asp He Cys Glu Val Ala Ser Val Val Gly Arg Lys Glu Lys He 
115 120 125 

ggc agg gcg gga ata ggg cgc atg gac gtc tgc teg agg gtg ccg cca 432 

Gly Arg Ala Gly He Gly Arg Met Asp Val Cys Ser Arg Val Pro Pro 

130 135 140 



aag ttt gcc ttt ccg tac age ggg aac ata ata gtc etc gag gtc tec 
Lys Phe Ala Phe Pro Tyr Ser Gly Asn lie He Val Leu Glu Val Ser 
145 150 155 160 



480 
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agc gag aag age tac cag age gtc aac aag tac tgc gag aag acg egg 528 
Ser Glu Lys Ser Tyr Gin Ser Val Asn Lys Tyr Cys Glu Lys Thr Arg 
165 170 175 

cgc gag gtc ate cgc aag ggg ata acg atg acc aac ctt gtg age ctg 576 
Arg Glu Val lie Arg Lys Gly lie Thr Met Thr Asn Leu Val Ser Leu 
180 185 190 

tec ata ctg gag egg eta aag tag 600 
Ser He Leu Glu Arg Leu Lys * 
195 

<210> 22 

<211> 199 

<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 22 



Met 


Ser Gin 


Thr 


Gly 


Ala 


Pro 


Gly 


Gly His Ala 


Cys 


Thr Pro 


Tyr 


Thr 


1 






5 










10 








15 




His 


Asp His Ala 


Ser 


He 


Glu 


Leu 


Lys Asp Ala 


Trp 


Ala Ser 


Ser Arg 






20 










25 








30 






Asn 


Val Arg 


Glu 


Met 


Tyr 


Phe 


Val 


Thr 


Ala 


Thr 


Phe 


Ser Ser 


Glu 


Ser 




35 










40 










45 






Gin 


Pro Tyr 


Phe 


Ala 


Pro 


Gin 


Ala 


Asn 


His 


Tyr 


Leu 


Leu Ala 


Arg 


Phe 




50 








55 










60 








Lys 


Asp Ala 


Pro 


Arg 


Met 


He 


Lys 


Ala 


Val 


Gly 


Arg 


Gly Glu Gly Ala 


65 








70 










75 








80 


Ser 


Tyr Val 


Phe 


Ser 


Met 


Asp 


Glu 


Asp 


He 


Phe 


Glu 


Arg Glu 


Ser 


Pro 








85 










90 








95 




Gly Val Ser 


Tyr 


Val 


Ser 


Val 


Tyr 


Tyr 


Leu 


Glu 


Tyr Gly Asp 


Ser 


Glu 






100 










105 








110 






Glu 


Asp He 


Cys 


Glu 


Val 


Ala 


Ser 


Val 


val 


Gly 


Arg 


Lys Glu 


Lys 


He 




115 










120 










125 






Gly Arg Ala Gly 


He 


Gly Arg 


Met 


Asp Val 


Cys 


Ser 


Arg Val 


Pro 


Pro 




130 








135 










140 








Lys 


Phe Ala 


Phe 


Pro 


Tyr 


Ser 


Gly 


Asn 


He 


He 


Val 


Leu Glu 


Val 


Ser 


145 








150 










155 








160 


Ser 


Glu Lys 


Ser 


Tyr 


Gin 


Ser 


Val 


Asn Lys 


Tyr 


Cys 


Glu Lys 


Thr 


Arg 








165 










170 








175 




Arg 


Glu Val 


He 


Arg 


Lys Gly 


lie 


Thr 


Met 


Thr 


Asn 


Leu Val 


Ser 


Leu 






180 










185 








190 






Ser 


He Leu 


Glu 


Arg 


Leu 


Lys 



















195 

<210> 23 

<211> 810 

<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (810) 



<400> 23 

ttg get egg cgc tac aag ccc egg ata aag cag gtc eta cgc gag gtg 
Met Ala Arg Arg Tyr Lys Pro Arg He Lys Gin Val Leu Arg Glu Val 
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15 10 15 

ccc etc aag aac gtg cac gtg tgg aag gac gcg cag gca agg agg ctg 96 
Pro Leu Lys Asn Val His Val Trp Lys Asp Ala Gin Ala Arg Arg Leu 
20 25 30 

gac agg tec agg gtg agg gag att gca aag teg ate agg tec gag ggc 144 
Aap Arg Ser Arg Val Arg Glu lie Ala Lys Ser He Arg Ser Glu Gly 
35 40 45 

ctg cag aac ccg ccc gta ata cag agg ggc ggc agg ggg ctg tac ctg 192 
Leu Gin Asn Pro Pro Val He Gin Arg Gly Gly Arg Gly Leu Tyr Leu 
50 55 60 

etc ata teg ggg aac cac agg ctt gcg gee eta aag cat ctg ggc gca 240 
Leu He Ser Gly Asn His Arg Leu Ala Ala Leu Lys His Leu Gly Ala 
65 70 75 80 

aaa aag tec aag ttt ctt gtg ata ace aag gat acg gag tac ggc ctg 288 
Lys Lys Ser Lys Phe Leu Val He Thr Lys Asp Thr Glu Tyr Gly Leu 
85 90 95 

gag gac gca aag gcg gca teg gtc gtg gag aac ctg cac egg atg cag 336 
Glu Asp Ala Lys Ala Ala Ser Val Val Glu Asn Leu His Arg Met Gin 
100 105 110 

atg age ccc egg gag etc gec gac gcg tgc agg ttt etc gee gag cag 384 
Met Ser Pro Arg Glu Leu Ala Asp Ala Cys Arg Phe Leu Ala Glu Gin 
115 120 125 

atg acc cgc gec gag gec gca agg aag etc ggc atg teg atg ccc acg 432 
Met Thr Arg Ala Glu Ala Ala Arg Lys Leu Gly Met Ser Met Pro Thr 
130 135 140 

ttc aaa aag tac cac ggc ttt gcg ggc gtg ccg gag aag ate aag gcg 480 
Phe Lys Lys Tyr His Gly Phe Ala Gly Val Pro Glu Lys He Lys Ala 
145 150 155 160 

eta gtc ccc ggg acc ata tec egg gac gag gcg aca aag ctg tac cag 528 
Leu Val Pro Gly Thr He Ser Arg Asp Glu Ala Thr Lys Leu Tyr Gin 
165 170 175 

gec gtc ccg acc gtc tec cag gcg etc aag gtg gcg ctg aac ata tea 576 
Ala Val Pro Thr Val Ser Gin Ala Leu Lys Val Ala Leu Asn He Ser 
180 185 190 

agg ctt gat egg ccg teg agg egg ate tac ctg agg ctg eta gec cag 624 
Arg Leu Asp Arg Pro Ser Arg Arg lie Tyr Leu Arg Leu Leu Ala Gin 
195 200 205 

age ccc cgc teg ggc cac agg ate ctg eta aag agg gtg cgc aag acg 672 
Ser Pro Arg Ser Gly His Arg He Leu Leu Lys Arg Val Arg Lys Thr 
210 215 220 



ggc gtc agg aag aag ate ccc ata gag etc ggc aag aac ggc gca aga 
Gly Val Arg Lys Lys He Pro He Glu Leu Gly Lys Asn Gly Ala Arg 
225 230 235 240 
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aag ctt gcc egg gtg gec gag cgc gag ggc acc gac gag acc egg ctt 768 
Lys Leu Ala Arg Val Ala Glu Arg Glu Gly Thr Asp Glu Thr Arg Leu 
245 250 255 



gcc aac agg ata gtc egg gag tac ctg agg aag cag cga tga 810 
Ala Asn Arg He Val Arg Glu Tyr Leu Arg Lys Gin Arg * 
260 265 



<210> 24 
<211> 269 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 24 
Met Ala Arg Arg 
1 

Pro Leu Lys Asn 
20 

Asp Arg Ser Arg 
35 

Leu Gin Asn Pro 
50 

Leu He Ser Gly 
65 

Lys Lys Ser Lys 

Glu Asp Ala Lys 
100 

Met Ser Pro Arg 
115 

Met Thr Arg Ala 
130 

Phe Lys Lys Tyr 
145 

Leu Val Pro Gly 

Ala Val Pro Thr 
180 

Arg Leu Asp Arg 
195 

Ser Pro Arg Ser 
210 

Gly Val Arg Lys 
225 

Lys Leu Ala Arg 

Ala Asn Arg He 
260 



Tyr Lys Pro Arg 
5 

Val His Val Trp 

Val Arg Glu He 
40 

Pro Val He Gin 
55 

Asn His Arg Leu 
70 

Phe Leu Val He 
85 

Ala Ala Ser Val 

Glu Leu Ala Asp 
12 0 

Glu Ala Ala Arg 
135 

His Gly Phe Ala 
150 

Thr He Ser Arg 
165 

Val Ser Gin Ala 

Pro Ser Arg Arg 
200 

Gly His Arg He 
215 

Lys lie Pro He 
230 

Val Ala Glu Arg 
245 

Val Arg Glu Tyr 



He Lys Gin Val 
10 

Lys Asp Ala Gin 
25 

Ala Lys Ser He 

Arg Gly Gly Arg 
60 

Al a Ala Leu Lys 
75 

Thr Lys Asp Thr 
90 

Val Glu Asn Leu 
105 

Ala Cys Arg Phe 

Lys Leu Gly Met 
140 

Gly Val Pro Glu 
155 

Asp Glu Ala Thr 
170 

Leu Lys Val Ala 
185 

He Tyr Leu Arg 

Leu Leu Lys Arg 
220 

Glu Leu Gly Lys 
235 

Glu Gly Thr Asp 
250 

Leu Arg Lys Gin 
265 



Leu Arg Glu Val 
15 

Ala Arg Arg Leu 
30 

Arg Ser Glu Gly 
45 

Gly Leu Tyr Leu 

His Leu Gly Ala 
80 

Glu Tyr Gly Leu 
95 

His Arg Met Gin 
110 

Leu Ala Glu Gin 
125 

Ser Met Pro Thr 

Lys He Lys Ala 
160 

Lys Leu Tyr Gin 
175 

Leu Asn He Ser 
190 

Leu Leu Ala Gin 
205 

Val Arg Lys Thr 

Asn Gly Ala Arg 
240 

Glu Thr Arg Leu 

255 

Arg 



<210> 25 
<211> 837 
<212> DNA 

<213> Cenarchaeum symbiosum 



<220> 

<221> CDS 

<222> (1) . . . (837) 
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<400> 25 

ttg tta act gtg ttt ggt aag ttt ate aca aca att agg tta gat aga 48 

Met Leu Thr Val Phe Gly Lys Phe He Thr Thr He Arg Leu Asp Arg 
15 10 15 

get gtt ccc ccg cag gec ccc gtg cac gta etc tat cgc gca gec ccc 96 
Ala Val Pro Pro Gin Ala Pro Val His Val Leu Tyr Arg Ala Ala Pro 
20 25 30 

egg ggg aca gec gga acc ggg ggc tgc egg ggc ggg ate ccg ggc gtc 144 
Arg Gly Thr Ala Gly Thr Gly Gly Cys Arg Gly Gly He Pro Gly Val 
35 40 45 

gat aga ata aat acg cgc ggg gec gcg gtg cga teg ccc gtg ctg ata 192 
Asp Arg He Asn Thr Arg Gly Ala Ala Val Arg Ser Pro Val Leu He 
50 55 60 

ata aac tgc aaa aac tat gag gag gec gee ggc ggc agg ate cgc ggg 240 
He Asn Cys Lys Asn Tyr Glu Glu Ala Ala Gly Gly Arg He Arg Gly 
65 70 75 80 

ctg gca gat gec gcg gec ggg get gee gee agg tac ggc gtc agg ata 2 88 

Leu Ala Asp Ala Ala Ala Gly Ala Ala Ala Arg Tyr Gly Val Arg He 
85 90 95 

gcg ata gee ccg ccg cag cac ctg ctg ggc att ata gca ggc egg gat 336 
Ala He Ala Pro Pro Gin His Leu Leu Gly He He Ala Gly Arg Asp 
100 105 110 

ctt ggc gtg ctg gee cag cat gtc gac gac aag ggg acg ggg age acc 304 
Leu Gly Val Leu Ala Gin His Val Asp Asp Lys Gly Thr Gly Ser Thr 
115 120 125 

aca ggg tat gtc gtc ccg gag ctg eta aaa cag teg ggg gtc tec ggg 432 
Thr Gly Tyr Val Val Pro Glu Leu Leu Lys Gin Ser Gly Val Ser Gly 
130 135 140 

gee ata ate aac cac age gag cac cgc gta ccc gcg gac cag gtg gcg 4 80 

Ala He He Asn His Ser Glu His Arg Val Pro Ala Asp Gin Val Ala 
145 150 155 160 

ggc ctg gta cca agg etc agg ggc ctt ggc atg gtc teg gtg gtc tgc 528 
Gly Leu Val Pro Arg Leu Arg Gly Leu Gly Met Val Ser Val Val Cys 
165 170 175 

gtc agg gat ccc gee gag gec gee gat etc tec egg tat tgc ccc gac 576 
Val Arg Asp Pro Ala Glu Ala Ala Asp Leu Ser Arg Tyr Cys Pro Asp 
180 185 190 

tac ata gcg ata gag cct ccc gag ctg ata ggt tec ggc agg tec gtc 624 
Tyr He Ala He Glu Pro Pro Glu Leu He Gly Ser Gly Arg Ser Val 
195 200 205 

teg aca gag agg ccc cag gtc ata caa gag gec gca gag gee ate agg 672 
Ser Thr Glu Arg Pro Gin Val He Gin Glu Ala Ala Glu Ala He Arg 
210 215 220 

ggg get ggc ggc gta aag ctg etc tgc ggg gcg ggc ata acc tec ggg 720 
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Gly Ala Gly Gly Val Lys Leu Leu Cys Gly Ala Gly lie Thr Ser Gly 
225 230 235 240 

gcg gac gtg cgc agg gcc etc gag ctt ggc tec gag ggc att ctt gtg 768 
Ala Asp Val Arg Arg Ala Leu Glu Leu Gly Ser Glu Gly He Leu Val 
245 250 255 

gca age ggg gtc gta aag teg gca gac ccc gca ggg gcc ate ggg gag 816 
Ala Ser Gly Val Val Lys Ser Ala Asp Pro Ala Gly Ala He Gly Glu 
260 265 270 

ctt gcc egg gcc atg tec tga 837 
Leu Ala Arg Ala Met Ser * 
275 

<210> 26 
<211> 278 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 26 



Met Leu Thr Val 


Phe 


Gly Lys 


Phe 


He Thr Thr 


He Arg Leu Asp Arg 


1 


5 








10 


15 


Ala Val Pro Pro 


Gin 


Ala 


Pro 


Val 


His Val Leu 


Tyr Arg Ala Ala Pro 


20 










25 


30 


Arg Gly Thr Ala 


Gly 


Thr 


Gly 


Gly 


Cys Arg Gly 


Gly He Pro Gly Val 


35 








40 




45 


Asp Arg He Asn 


Thr 


Arg Gly 


Ala 


Ala Val Arg 


Ser Pro Val Leu He 


50 






55 






60 


He Asn Cys Lys 


Asn 


Tyr 


Glu 


Glu 


Ala Ala Gly 


Gly Arg He Arg Gly 


65 




70 






75 


80 


Leu Ala Asp Ala 


Ala 


Ala 


Gly 


Ala 


Ala Ala Arg 


Tyr Gly Val Arg He 




85 








90 


95 


Ala He Ala Pro 


Pro 


Gin 


His 


Leu 


Leu Gly He 


He Ala Gly Arg Asp 


100 










105 


110 


Leu Gly Val Leu 


Ala 


Gin 


His 


Val 


Asp Asp Lys 


Gly Thr Gly Ser Thr 


115 








120 




125 


Thr Gly Tyr Val 


Val 


Pro 


Glu 


Leu 


Leu Lys Gin 


Ser Gly Val Ser Gly 


130 






135 






140 


Ala lie lie Asn 


His 


Ser 


Glu 


His 


Arg Val Pro 


Ala Asp Gin Val Ala 


145 




150 






155 


160 


Gly Leu Val Pro 


Arg 


Leu Arg 


Gly 


Leu Gly Met 


Val Ser Val Val Cys 




165 








170 


175 


Val Arg Asp Pro 


Ala 


Glu 


Ala 


Ala 


Asp Leu Ser 


Arg Tyr Cys Pro Asp 


180 










185 


190 


Tyr lie Ala lie 


Glu 


Pro 


Pro 


Glu 


Leu He Gly 


Ser Gly Arg Ser Val 


195 








200 




205 


Ser Thr Glu Arg 


Pro 


Gin 


Val 


lie 


Gin Glu Ala 


Ala Glu Ala He Arg 


210 






215 






220 


Gly Ala Gly Gly 


Val 


Lys 


Leu 


Leu 


Cys Gly Ala 


Gly He Thr Ser Gly 


225 




230 






235 


240 


Ala Asp Val Arg 


Arg 


Ala 


Leu 


Glu 


Leu Gly Ser 


Glu Gly lie Leu Val 




245 








250 


255 


Ala Ser Gly Val 


Val 


Lys 


Ser 


Ala 


Asp Pro Ala 


Gly Ala He Gly Glu 


260 










265 


270 


Leu Ala Arg Ala 


Met 


Ser 











275 
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<210> 27 
<211> 549 
<212> DMA 

<213> Cenarchaeum symbioBum 

<220> 

<221> CDS 

<222> (1) . . . (549) 

<400> 27 

atg ctg gat cca agg aaa egg ccc agg gtg gtc aac gtt gtg agt acc 48 

Met Leu Asp Pro Arg Lye Arg Pro Arg Val Val Asn Val Val Ser Thr 

15 10 15 

gec gac ctg ggc egg agg gtg ggc gca aaa aag atg gec gec atg cca 96 
Ala Asp Leu Gly Arg Arg Val Gly Ala Lys Lys Met Ala Ala Met Pro 
20 25 30 

tgc tgc atg tac gac gag gcg gta tac ggc ggc agg tgc ggc tat ate 144 
Cys Cys Met Tyr Asp Glu Ala Val Tyr Gly Gly Arg Cys Gly Tyr lie 
35 40 45 

aaa aca ccc ggc atg egg ggg cgc gtg acg gtg ttt etc teg ggc aag 192 
Lys Thr Pro Gly Met Arg Gly Arg Val Thr Val Phe Leu Ser Gly Lys 
50 55 60 

atg ata tec gtc ggc gee age tec gtg agg gca teg ttt gcg cag ctg 240 
Met lie Ser Val Gly Ala Ser Ser Val Arg Ala Ser Phe Ala Gin Leu 
65 70 75 80 

cac gag gec egg ctg cac ctg ttc egg aac ggg gcg gcg gee ggc ggg 288 
His Glu Ala Arg Leu His Leu Phe Arg Asn Gly Ala Ala Ala Gly Gly 
85 90 95 

tgt aca agg ccc gtc gta cgc aat atg gtg gcg aca gtg gat gca gga 336 
Cys Thr Arg Pro Val Val Arg Asn Met Val Ala Thr Val Asp Ala Gly 
100 105 110 

egg act gtt ccc ata gac agg ata teg teg egg ata ccc ggc gcg gtg 384 
Arg Thr Val Pro lie Asp Arg lie Ser Ser Arg He Pro Gly Ala Val 
115 120 125 

tac gac ccg ggg teg ttt ccc ggc atg ata eta aag ggg ctg ggc age 432 
Tyr Asp Pro Gly Ser Phe Pro Gly Met He Leu Lys Gly Leu Gly Ser 
130 135 140 

tgc age ttc ctt gtg ttt gcg teg gga aag gtg gtg ata gcg ggc gec 480 
Cys Ser Phe Leu Val Phe Ala Ser Gly Lys Val Val He Ala Gly Ala 
145 150 155 160 

egg teg cca ggc gag eta tac agg teg teg ttt gac ctg ctg gcg cgc 528 
Arg Ser Pro Gly Glu Leu Tyr Arg Ser Ser Phe Asp Leu Leu Ala Arg 
165 170 175 



etc aac ggc gcg ggc gee tag 
Leu Asn Gly Ala Gly Ala * 
180 



549 
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<210> 28 
<211> X82 
<212> PRT 

<213> Cenarchaeum syrabiosum 
<400> 28 

Met Leu Asp Pro Arg Lys Arg Pro Arg Val Val Asn Val Val Ser Thr 

15 10 15 

Ala Asp Leu Gly Arg Arg Val Gly Ala Lys Lys Met Ala Ala Met Pro 

20 25 30 

Cys Cys Met Tyr Asp Glu Ala Val Tyr Gly Gly Arg Cys Gly Tyr lie 

35 40 45 

Lys Thr Pro Gly Met Arg Gly Arg Val Thr Val Phe Leu Ser Gly Lys 

50 55 60 

Met lie Ser Val Gly Ala Ser Ser Val Arg Ala Ser Phe Ala Gin Leu 
65 70 75 80 

His Glu Ala Arg Leu His Leu Phe Arg Asn Gly Ala Ala Ala Gly Gly 

85 90 95 

Cys Thr Arg Pro Val Val Arg Asn Met Val Ala Thr Val Asp Ala Gly 

100 105 no 

Arg Thr Val Pro lie Asp Arg He Ser Ser Arg He Pro Gly Ala Val 

115 120 125 

Tyr Asp Pro Gly Ser Phe Pro Gly Met He Leu Lys Gly Leu Gly Ser 

130 135 140 

Cys Ser Phe Leu Val Phe Ala Ser Gly Lys Val Val He Ala Gly Ala 
145 150 155 160 

Arg Ser Pro Gly Glu Leu Tyr Arg Ser Ser Phe Asp Leu Leu Ala Arg 

165 170 175 

Leu Asn Gly Ala Gly Ala 
180 

<210> 29 

<211> 2535 

<212> DNA 

<213> Cenarchaeum symbiosura 

<220> 

<221> CDS 

<222> (1} . . . (2535) 

<400> 29 

gtg acg gtg caa gat gcc gta gag ata ccc ccg teg ctg ctg gta tct 48 
Met Thr Val Gin Asp Ala Val Glu He Pro Pro Ser Leu Leu Val Ser 
15 10 15 

gca aca tac gac age cag gca ggg gcg gtc gtc etc aag ttt tac gag 96 
Ala Thr Tyr Asp Ser Gin Ala Gly Ala Val Val Leu Lys Phe Tyr Glu 
20 25 30 

ccg gaa tea caa aag ate gta cac tgg acg gac aat acg ggg cac aag 144 
Pro Glu Ser Gin Lys He Val His Trp Thr Asp Asn Thr Gly His Lys. 
35 40 45 

ccc tac tgc tat acg agg cag ccc ccc tec gag ctt ggg gag ctt gaa 192 
Pro Tyr Cys Tyr Thr Arg Gin Pro Pro Ser Glu Leu Gly Glu Leu Glu 
50 55 60 



ggc agg gag gat gtg eta gga acg gag cag gtc atg egg cac gac ctg 
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Gly Arg Glu Asp Val Leu Gly Thr Glu Gin Val Met Arg His Asp Leu 
65 70 75 80 

ata gcc gac aag gat gtg ccc gtc acc aag ata act gtg gcc gac ccc 288 
lie Ala Asp Lys Asp Val Pro Val Thr Lys He Thr Val Ala Asp Pro 
85 90 95 

ctt gcc ata ggc ggg acc aac teg gag aag age ate cgc aac ate atg 336 
Leu Ala He Gly Gly Thr Asn Ser Glu Lys Ser He Arg Asn He Met 
100 105 110 

gac acg tgg gaa tec gac ata aag tac tat gag aac tat ctg tac gac 384 
Asp Thr Trp Glu Ser Asp He Lys Tyr Tyr Glu Asn Tyr Leu Tyr Asp 
115 120 125 

aag age ctg gtc gtg ggc agg tac tat teg gta tec ggc ggc aag gta 432 
Lys Ser Leu Val Val Gly Arg Tyr Tyr Ser Val Ser Gly Gly Lys Val 
130 135 140 

ate ccg cat gac atg ccc ata tec gac gag gta aag ctg gcc etc aag 480 
He Pro His Asp Met Pro He Ser Asp Glu Val Lys Leu Ala Leu Lys 
145 150 155 160 

age etc etc tgg gac aag gtt gta gac gag ggc atg gcg gac aga aaa 528 
Ser Leu Leu Trp Asp Lys Val Val Asp Glu Gly Met Ala Asp Arg Lys 
165 170 175 

gag ttc cgc gag ttc ata gcg ggg tgg gcg gac ctg etc aac cag ccc 576 
Glu Phe Arg Glu Phe He Ala Gly Trp Ala Asp Leu Leu Asn Gin Pro 
180 185 190 

ata ccc agg ata egg cgc etc age ttt gat ate gag gtg gat tea gag 624 
He Pro Arg He Arg Arg Leu Ser Phe Asp He Glu Val Asp Ser Glu 
195 200 205 

9^g ggc agg ate ccc gac ccc aag ata tec gac agg agg gtt acg gcg 672 
Glu Gly Arg He Pro Asp Pro Lys lie Ser Asp Arg Arg Val Thr Ala 
210 215 220 

gtg ggg ttt gcc gcc acc gac ggc eta aaa cag gta ttc gtc ctg agg 720 
Val Gly Phe Ala Ala Thr Asp Gly Leu Lys Gin Val Phe Val Leu Arg 
225 230 235 240 

age ggc gca gaa gag ggc gag aac ggc gtg acc ccc ggt gtc gag gtg 768 
Ser Gly Ala Glu Glu Gly Glu Asn Gly Val Thr Pro Gly Val Glu Val 
245 250 255 

gta ttc tac gac aag gaa get gac atg ate cgc gac gcg eta teg gta 816 
Val Phe Tyr Asp Lys Glu Ala Asp Met He Arg Asp Ala Leu Ser Val 
260 265 270 

ata ggc teg tac ccg ttt gtt ctg acg tac aac ggc gac gac ttt gac 864 
He Gly Ser Tyr Pro Phe Val Leu Thr Tyr Asn Gly Asp Asp Phe Asp 
275 280 285 



atg ccg tac atg etc aac agg gca egg cgc etc gga gta tct gac tct 
Met Pro Tyr Met Leu Asn Arg Ala Arg Arg Leu Gly Val Ser Asp Ser 
290 295 300 
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gac att cct ttg tac atg atg egg gat tct gec acg etc egg cac gga 960 
Asp lie Pro Leu Tyr Met Met Arg Asp Ser Ala Thr Leu Arg His Gly 
305 310 315 320 

gtc cac ctg gac ctg tac agg acc ttc teg aac agg tea ttc cag ctg 1008 
Val His Leu Asp Leu Tyr Arg Thr Phe Ser Asn Arg Ser Phe Gin Leu 
325 330 335 

tac gec ttt gcg gca aag tac acg gac tat tec ctt aac age gtc aca 1056 
Tyr Ala Phe Ala Ala Lys Tyr Thr Asp Tyr Ser Leu Asn Ser Val Thr 
340 345 350 

aag gcg atg etc ggc gag ggc aag gtc gac tat ggg gtc aaa ctg ggg 1104 
Lys Ala Met Leu Gly Glu Gly Lys Val Asp Tyr Gly Val Lys Leu Gly 
355 360 365 

gat etc acc tta tac cag act gca aac tat tgc tat cac gac gcg cgc 1152 
Asp Leu Thr Leu Tyr Gin Thr Ala Asn Tyr Cys Tyr His Asp Ala Arg 
370 375 380 

ctg acg etc gag ctt age acc ttt ggc aac gag ata etc atg gac ctg 1200 
Leu Thr Leu Glu Leu Ser Thr Phe Gly Asn Glu lie Leu Met Asp Leu 
385 390 395 400 

ctg gtg gtg acc age aga ata gee egg atg ccc ate gat gac atg tec 124 8 

Leu Val Val Thr Ser Arg lie Ala Arg Met Pro lie Asp Asp Met Ser 
405 410 415 

cgc atg ggc gtc teg cag tgg ata cgc age ctg ctg tac tat gag cac 1296 
Arg Met Gly Val Ser Gin Trp He Arg Ser Leu Leu Tyr Tyr Glu His 
420 425 430 

aga cag cga aac gcg etc ata ccg egg agg gac gag ctg gag ggc agg 1344 
Arg Gin Arg Asn Ala Leu He Pro Arg Arg Asp Glu Leu Glu Gly Arg 
435 440 445 

teg cgc gag gtg age aac gac gcg gta ata aag gat aaa aag ttc cgc 1392 
Ser Arg Glu Val Ser Asn Asp Ala Val He Lys Asp Lys Lys Phe Arg 
450 455 460 

333 93° ctt gtc gtc gag cct gaa gag ggc ata cac ttt gat gtt acg 144 0 

Gly Gly Leu Val Val Glu Pro Glu Glu Gly He His Phe Asp Val Thr 
465 470 475 480 

gtg atg gac ttt gcg age ctg tat ccc agt ate ata aag gtg agg aac 1488 
Val Met Asp Phe Ala Ser Leu Tyr Pro Ser He He Lys Val Arg Asn 
485 490 495 

etc teg tac gag acc gtc egg tgc gtg cat gca gaa tgc aaa aag aac 1536 
Leu Ser Tyr Glu Thr Val Arg Cys Val His Ala Glu Cys Lys Lys Asn 
500 505 510 

acc ate ccc gat acc aac cac tgg gta tgt aca aaa aac aac ggc ctg 1584 
Thr He Pro Asp Thr Asn His Trp Val Cys Thr Lys Asn Asn Gly Leu 
515 520 525 



aca teg atg ata ate ggc teg ctg egg gac ctg cgc gtc aac tat tac 



1632 
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Thr Ser Met He He Gly Ser Leu Arg Asp Leu Arg Val Asn Tyr Tyr 
530 535 540 

aag age etc tea aag age aca tec att acg gag gag cag egg cag cag 1680 
Lys Ser Leu Ser Lys Ser Thr Ser He Thr Glu Glu Gin Arg Gin Gin 
545 550 555 560 

tat ace gta ate age cag gee etc aag gtc gtg etc aac gca age tac 1728 
Tyr Thr Val He Ser Gin Ala Leu Lys Val Val Leu Asn Ala Ser Tyr 
565 570 575 

ggc gtg atg ggc gee gag ata ttc ccg ctg tac ttt tta ccc gcg gca 1776 
Gly Val Met Gly Ala Glu He Phe Pro Leu Tyr Phe Leu Pro Ala Ala 
580 585 590 

gag gec ace act get gtc ggg cgc tat ate ate atg cag acg ata teg 1824 
Glu Ala Thr Thr Ala Val Gly Arg Tyr He He Met Gin Thr He Ser 
595 600 605 

cac tgc gag cag atg gga gtg agg gtg ctg tac ggg gac acc gat tct 1872 
His Cys Glu Gin Met Gly Val Arg Val Leu Tyr Gly Asp Thr Asp Ser 
610 615 620 

ctg ttc ata aag gat ccc gaa gag agg cag ate cac gag ata gtc gag 192 0 

Leu Phe He Lys Asp Pro Glu Glu Arg Gin He His Glu He Val Glu 
625 630 635 640 

cat gca aag aag gag cac ggt gtg gag etc gaa gtg gac aaa gag tac 1968 
His Ala Lys Lys Glu His Gly Val Glu Leu Glu Val Asp Lys Glu Tyr 
645 650 655 

agg tat gtc gtg eta tec aac agg aaa aaa aac tat ttc ggg gtg acc 2016 
Arg Tyr Val Val Leu Ser Asn Arg Lys Lys Asn Tyr Phe Gly Val Thr 
660 665 670 

egg gca ggc aag gtc gac gtc aag ggg ctg acg ggc aaa aag teg cac 2064 
Arg Ala Gly Lys Val Asp Val Lys Gly Leu Thr Gly Lys Lys Ser His 
675 680 685 

acg ccc ccg ttc ata aag gag etc ttc tac teg ctg etc gac ata etc 2112 
Thr Pro Pro Phe He Lys Glu Leu Phe Tyr Ser Leu Leu Asp He Leu 
690 695 700 

tea gga gtc gag age gag gac gag ttc gag tea gee aag atg agg ate 2160 
Ser Gly Val Glu Ser Glu Asp Glu Phe Glu Ser Ala Lys Met Arg He 
705 710 715 720 

tea aag gcg ate gee gcg tgc ggc aag agg etc gag gag agg cag ate 2208 
Ser Lys Ala He Ala Ala Cys Gly Lys Arg Leu Glu Glu Arg Gin He 
725 730 735 

ccc etc gtg gac ctg gcg ttc aat gtg atg ata age aag gcg ccc tec 2256 
Pro Leu Val Asp Leu Ala Phe Asn Val Met He Ser Lys Ala Pro Ser 
740 745 750 

gaa tat gtc aag acc gtc ccg cag cac ata egg gcg gca agg ctg ctg 2304 
Glu Tyr Val Lys Thr Val Pro Gin His lie Arg Ala Ala Arg Leu Leu 
755 760 765 
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gag aac gca agg gag gtc aaa aag ggc gac ata ata teg tac gta aag 2352 
Glu Asn Ala Arg Glu Val Lys Lys Gly Asp lie lie Ser Tyr Val Lys 
770 775 760 

gtg atg aac aag acc ggc gtc aag ccg gtg gag atg gec egg gca ggc 2400 
Val Met Asn Lys Thr Gly Val Lys Pro Val Glu Met Ala Arg Ala Gly 
785 790 795 800 

gag gtg gac acg tea aag tac etc gag ttc atg gag teg acg etc gac 2448 
Glu Val Asp Thr Ser Lys Tyr Leu Glu Phe Met Glu Ser Thr Leu Asp 
805 810 815 

cag etc acc teg tec atg ggc ctt gac ttt gac gag ata etc ggc aag 24 96 

Gin Leu Thr Ser Ser Met Gly Leu Asp Phe Asp Glu lie Leu Gly Lys 
820 825 830 

cca aag cag acc ggc atg gag cag ttc ttt ttc aaa tga 2535 
Pro Lys Gin Thr Gly Met Glu Gin Phe Phe Phe Lys * 
835 840 

<210> 30 
<211> 844 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 30 



Met 


Thr 


Val 


Gin 


Asp 


Ala 


Val 


Glu 


He 


Pro Pro 


Ser Leu Leu Val Ser 


1 








5 










10 


15 


Ala 


Thr 


Tyr Asp 


Ser 


Gin 


Ala 


Gly 


Ala 


Val Val 


Leu Lys Phe Tyr Glu 








20 










25 




30 


Pro 


Glu 


Ser 


Gin 


Lys 


He 


Val 


His 


Trp 


Thr Asp Asn Thr Gly His Lys 






35 










40 






45 


Pro 


Tyr 


Cys 


Tyr 


Thr 


Arg 


Gin 


Pro 


Pro 


Ser Glu Leu Gly Glu Leu Glu 




50 










55 








60 


Gly Arg 


GlU 


Asp 


val 


Leu Gly 


Thr 


Glu 


Gin Val 


Met Arg His Asp Leu 


65 










70 








75 


80 


lie 


Ala 


Asp 


Lys 


Asp 


Val 


Pro 


Val 


Thr 


Lys He 


Thr Val Ala Asp Pro 










85 










90 


95 


Leu 


Ala 


He Gly 


Gly 


Thr 


Asn 


Ser 


Glu 


Lys Ser 


He Arg Asn He Met 








100 










105 




110 


Asp 


Thr 


Trp 


Glu 


Ser 


Asp 


He 


Lys 


Tyr 


Tyr Glu 


Asn Tyr Leu Tyr Asp 






115 










120 






125 


Lys 


Ser 


Leu 


Val 


Val 


Gly Arg 


Tyr 


Tyr 


Ser Val 


Ser Gly Gly Lys Val 




130 










135 








140 


lie 


Pro 


His 


Asp 


Met 


Pro 


He 


Ser 


Asp 


Glu Val 


Lys Leu Ala Leu Lys 


145 










150 








155 


160 


Ser 


Leu 


Leu 


Trp 


Asp 


Lys 


Val 


Val 


Asp Glu Gly Met Ala Asp Arg Lys 










165 










170 


175 


Glu 


Phe 


Arg 


Glu 


Phe 


He 


Ala 


Gly 


Trp Ala Asp 


Leu Leu Asn Gin Pro 








180 










185 




190 


He 


Pro 


Arg 


He 


Arg 


Arg 


Leu 


Ser 


Phe 


Asp lie 


Glu Val Asp Ser Glu 






195 










200 






205 


Glu 


Gly 


Arg 


He 


Pro 


Asp 


Pro 


Lys 


He 


Ser Asp Arg Arg Val Thr Ala 




210 










215 








220 


Val 


Gly 


Phe 


Ala 


Ala 


Thr 


Asp 


Gly 


Leu 


Lys Gin 


Val Phe Val Leu Arg 


225 










230 








235 


240 


Ser 


Gly 


Ala 


Glu 


Glu 


Gly 


Glu 


Asn 


Gly Val Thr 


Pro Gly Val Glu Val 
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Val Phe Tyr Asp 
260 

lie Gly Ser Tyr 
275 

Met Pro Tyr Met 
290 

Asp He Pro Leu 
305 

Val His Leu Asp 

Tyr Ala Phe Ala 
340 

Lys Ala Met Leu 
355 

Asp Leu Thr Leu 
370 

Leu Thr Leu Glu 
385 

Leu Val Val Thr 

Arg Met Gly Val 
420 

Arg Gin Arg Asn 
435 

Ser Arg Glu Val 
450 

Gly Gly Leu Val 
465 

Val Met Asp Phe 

Leu Ser Tyr Glu 
500 

Thr He Pro Asp 
515 

Thr Ser Met He 
530 

Lys Ser Leu Ser 
545 

Tyr Thr Val lie 

Gly Val Met Gly 
580 

Glu Ala Thr Thr 
595 

His Cys Glu Gin 
610 

Leu Phe He Lys 
625 

His Ala Lys Lys 

Arg Tyr Val Val 
660 

Arg Ala Gly Lys 
675 

Thr Pro Pro Phe 
690 

Ser Gly Val Glu 



245 

Lys Glu Ala Asp 

Pro Phe Val Leu 
280 

Leu Asn Arg Ala 
295 

Tyr Met Met Arg 
310 

Leu Tyr Arg Thr 
325 

Ala Lys Tyr Thr 

Gly Glu Gly Lys 
360 

Tyr Gin Thr Ala 
375 

Leu Ser Thr Phe 
390 

Ser Arg He Ala 
405 

Ser Gin Trp He 

Ala Leu He Pro 
440 

Ser Asn Asp Ala 
455 

Val Glu Pro Glu 
470 

Ala Ser Leu Tyr 
485 

Thr Val Arg Cys 

Thr Asn His Trp 
520 

He Gly Ser Leu 
535 

Lys Ser Thr Ser 
550 

Ser Gin Ala Leu 
565 

Ala Glu lie Phe 

Ala Val Gly Arg 
600 

Met Gly Val Arg 
615 

Asp Pro Glu Glu 
630 

Glu His Gly Val 
645 

Leu Ser Asn Arg 

Val Asp Val Lys 
680 

lie Lys Glu Leu 
695 

Ser Glu Asp Glu 



-79- 
250 

Met He Arg Asp 
265 

Thr Tyr Asn Gly 

Arg Arg Leu Gly 
300 

Asp Ser Ala Thr 
315 

Phe Ser Asn Arg 
330 

Asp Tyr Ser Leu 
345 

Val Asp Tyr Gly 

Asn Tyr Cys Tyr 
380 

Gly Asn Glu He 
395 

Arg Met Pro lie 
410 

Arg Ser Leu Leu 
425 

Arg Arg Asp Glu 

Val He Lys Asp 
460 

Glu Gly He His 
475 

Pro Ser He He 
490 

Val His Ala Glu 
505 

Val Cys Thr Lys 

Arg Asp Leu Arg 
540 

He Thr Glu Glu 
555 

Lys Val Val Leu 
570 

Pro Leu Tyr Phe 
585 

Tyr He He Met 

Val Leu Tyr Gly 
620 

Arg Gin He His 
635 

Glu Leu Glu Val 
650 

Lys Lys Asn Tyr 
665 

Gly Leu Thr Gly 

Phe Tyr Ser Leu 
700 

Phe Glu Ser Ala 



255 

Ala Leu Ser Val 
270 

Asp Asp Phe Asp 
285 

Val Ser Asp Ser 

Leu Arg His Gly 
320 

Ser Phe Gin Leu 
335 

Asn Ser Val Thr 
350 

Val Lys Leu Gly 
365 

His Asp Ala Arg 

Leu Met Asp Leu 
400 

Asp Asp Met Ser 
415 

Tyr Tyr Glu His 
430 

Leu Glu Gly Arg 
445 

Lys Lys Phe Arg 

Phe Asp Val Thr 
480 

Lys Val Arg Asn 
495 

Cys Lys Lys Asn 
510 

Asn Asn Gly Leu 
525 

Val Asn Tyr Tyr 

Gin Arg Gin Gin 
560 

Asn Ala Ser Tyr 
575 

Leu Pro Ala Ala 
590 

Gin Thr He Ser 
605 

Asp Thr Asp Ser 

Glu He Val Glu 
640 

Asp Lys Glu Tyr 
655 

Phe Gly Val Thr 
670 

Lys Lys Ser His 
685 

Leu Asp He Leu 
Lys Met Arg He 



WO 00/18909 



-80- 



PCI7US99/22752 



705 710 715 720 

Ser Lys Ala lie Ala Ala Cys Gly Lys Arg Leu Glu Glu Arg Gin lie 

725 730 735 

Pro Leu Val Asp Leu Ala Phe Asn Val Met lie Ser Lys Ala Pro Ser 

740 745 750 

Glu Tyr Val Lys Thr Val Pro Gin His lie Arg Ala Ala Arg Leu Leu 

755 760 765 

Glu Asn Ala Arg Glu Val Lys Lys Gly Asp lie He Ser Tyr Val Lys 

770 775 780 

Val Met Asn Lys Thr Gly Val Lys Pro Val Glu Met Ala Arg Ala Gly 
785 790 795 800 

Glu Val Asp Thr Ser Lys Tyr Leu Glu Phe Met Glu Ser Thr Leu Asp 

805 810 815 

Gin Leu Thr Ser Ser Met Gly Leu Asp Phe Asp Glu He Leu Gly Lys 

820 825 830 

Pro Lys Gin Thr Gly Met Glu Gin Phe Phe Phe Lys 
835 840 

<210> 31 
<211> 555 
<212> DNA 

<213> Cenarchaeura symbiosum 

<220> 

<221> CDS 

<222> (1)...(555) 

<400> 31 

atg ccg ggc ggg ggc agg ctg ccc gtg age ggc ttt gag cgc cct acc 48 
Met Pro Gly Gly Gly Arg Leu Pro Val Ser Gly Phe Glu Arg Pro Thr 
15 10 15 

t-99 9 at gaa tat ttc atg ctg cag gcg gag ctt gca aag etc cga tec 96 
Trp Asp Glu Tyr Phe Met Leu Gin Ala Glu Leu Ala Lys Leu Arg Ser 
20 25 30 

aac tgt at a gtc cgc aag gtg ggg gec gta ata gtg agg gac cac egg 144 
Asn Cys He Val Arg Lys Val Gly Ala Val He Val Arg Asp His Arg 
35 40 45 

cag etc gec aca ggg tat aac ggg acg cct cct ggc gtc aag aac tgc 192 
Gin Leu Ala Thr Gly Tyr Asn Gly Thr Pro Pro Gly Val Lys Asn Cys 
50 55 60 

tac gag ggc ggc tgc gag agg tgt gee gag cgc ate gag ggc agg ate 240 
Tyr Glu Gly Gly Cys Glu Arg Cys Ala Glu Arg He Glu Gly Arg He 
65 70 75 80 

aag tea ggc gag gee ctg gac egg tgc ctg tgc aac cat gca gag gee 288 
Lys Ser Gly Glu Ala Leu Asp Arg Cys Leu Cys Asn His Ala Glu Ala 
85 90 95 

aac get ata atg cac tgt gcg ata etc ggg ata ggc gcg ggg ggc ggg 336 
Asn Ala He Met His Cys Ala He Leu Gly He Gly Ala Gly Gly Gly 
100 105 110 



ggg gee acc atg tac acc acg ttc teg ccg tgt ctg gag tgt acc aag 
Gly Ala Thr Met Tyr Thr Thr Phe Ser Pro Cys Leu Glu Cys Thr Lys 



384 
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115 120 

atg gcc gta acg ata ggg ate agg 
Met Ala Val Thr lie Gly lie Arg 
130 135 

ccc gag aac acc tec egg ctg gta 
Pro Glu Asn Thr Ser Arg Leu Val 
145 150 

atg atg gac aag gaa aag ate teg 
Met Met Asp Lys Glu Lys lie Ser 
165 

ggc age aag gag gtg ccg gtg egg 
Gly Ser Lys Glu Val Pro Val Arg 
180 

<210> 32 
<211> 184 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 32 



Met Pro 


Gly Gly 


Gly 


Arg Leu 


Pro 


Val 


Ser Gly Phe Glu Arg Pro Thr 


1 




5 








10 


15 




Trp Asp 


Glu Tyr 


Phe 


Met Leu 


Gin 


Ala 


Glu Leu Ala Lys Leu 


Arg 


Ser 




20 








25 


30 






Asn Cys 


He Val 


Arg 


Lys Val 


Gly 


Ala 


Val He Val Arg Asp 


His 


Arg 




35 






40 




45 






Gin Leu 


Ala Thr 


Gly 


Tyr Asn 


Gly 


Thr 


Pro Pro Gly Val Lys 


Asn 


Cys 


50 






55 






60 






Tyr Glu 


Gly Gly 


Cys 


Glu Arg 


Cys 


Ala 


Glu Arg He Glu Gly Arg 


He 


65 






70 






75 




80 


Lys Ser 


Gly Glu 


Ala 


Leu Asp 


Arg 


Cys 


Leu Cys Asn His Ala 


Glu 


Ala 






85 








90 


95 




Asn Ala 


He Met 


His 


Cys Ala 


He 


Leu Gly He Gly Ala Gly Gly Gly 




100 








105 


no 






Gly Ala 


Thr Met 


Tyr 


Thr Thr 


Phe 


Ser 


Pro Cys Leu Glu Cys 


Thr 


Lys 




115 






120 




12 5 






Met Ala 


Val Thr 


He 


Gly He 


Arg 


Arg 


Phe Val Cys Leu Asp 


Thr 


Tyr 


130 






135 






140 






Pro Glu 


Asn Thr 


Ser 


Arg Leu 


Val 


Lys 


Glu Thr Ser Ser Glu 


He 


Thr 


145 






150 






155 




160 


Met Met 


Asp Lys 


Glu 


Lys He 


Ser 


Tyr Trp Ala Ser Arg Met 


Pro Gly 






165 








170 


175 




Gly Ser 


Lys Glu 


Val 


Pro Val 


Arg 











180 

<210> 33 

<211> 1509 

<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (1509) 



125 



egg ttt gtc tgc ctt gat acc tac 432 
Arg Phe Val Cys Leu Asp Thr Tyr 
140 

aaa gag aea tec tec gag ata acc 480 
Lys Glu Thr Ser Ser Glu He Thr 
155 160 

tac tgg gcg tea agg atg ccc gga 528 
Tyr Trp Ala Ser Arg Met Pro Gly 
170 175 

tga 555 
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<400> 33 

gtg gag act ggg cac ata acg ggc agg tac ate gag ccc ggt gec gtc 48 
Met Glu Thr Gly His He Thr Gly Arg Tyr He Glu Pro Gly Ala Val 
15 10 15 

gag agg cgc gac tac cag gtg ggc ctg gcg gaa cag gec ata egg gag 96 
Glu Arg Arg Asp Tyr Gin Val Gly Leu Ala Glu Gin Ala He Arg Glu 
20 25 30 

aac tgt ate gtg gtg etc ccg acg ggc etc ggc aag act gee gtc gec 144 
Asn Cys He Val Val Leu Pro Thr Gly Leu Gly Lys Thr Ala Val Ala 
35 40 45 

etc cag gtg ate gec cac tat etc gac gag ggc cgc ggg gcg etc ttc 192 
Leu Gin Val He Ala His Tyr Leu Asp Glu Gly Arg Gly Ala Leu Phe 
50 55 60 

ctt gee cct aca agg gtc ctg gta aac cag cac cgc cag ttc ctg ggc 240 
Leu Ala Pro Thr Arg Val Leu Val Asn Gin His Arg Gin Phe Leu Gly 
65 70 75 80 

agg gee ctt ace ata tec gat att aca ctg gtc acg gga gag gac acc 288 
Arg Ala Leu Thr He Ser Asp He Thr Leu Val Thr Gly Glu Asp Thr 
85 90 95 

att ccc egg cgc aaa aag gcg tgg gga ggc age gtg ate tgc gee acg 336 
He Pro Arg Arg Lys Lys Ala Trp Gly Gly Ser Val He Cys Ala Thr 
100 105 110 

ccc gag ata gca aga aat gat ata gag cgc ggc ctg gtc ccg etc gaa 384 
Pro Glu He Ala Arg Asn Asp He Glu Arg Gly Leu Val Pro Leu Glu 
115 120 125 

cag ttc ggc ctg gtc ata ttc gac gag gee cac agg gcg gtg ggc gac 432 
Gin Phe Gly Leu Val He Phe Asp Glu Ala His Arg Ala Val Gly Asp 
130 135 140 

tat gec tat tct tec ata gcg egg gcg gta ggg gat aac tec agg atg 480 
Tyr Ala Tyr Ser Ser He Ala Arg Ala Val Gly Asp Asn Ser Arg Met 
145 150 155 160 

gtg ggc atg act gcg acg ctt ccc age gag agg gag aag gca gac gag 52 8 

Val Gly Met Thr Ala Thr Leu Pro Ser Glu Arg Glu Lys Ala Asp Glu 
165 170 175 

ata atg ggc acc ctg etc tec agg age ata gec cag agg aca gaa gac 57 6 

He Met Gly Thr Leu Leu Ser Arg Ser lie Ala Gin Arg Thr Glu Asp 
180 185 190 

gac ccg gac gta aag ccc tat gta cag gag act gee acc gag tgg ata 624 
Asp Pro Asp Val Lys Pro Tyr Val Gin Glu Thr Ala Thr Glu Trp He 
195 200 205 

aag gtg gat ctt ccc ccc gag atg aag gag ata cag agg etc etc aag 672 
Lys Val Asp Leu Pro Pro Glu Met Lys Glu He Gin Arg Leu Leu Lys 
210 215 220 



ctg gee etc gac gag agg tat tec tec etc aag agg tgc ggg tac gat 



720 
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Leu Ala Leu Asp Glu Arg Tyr Ser Ser Leu Lys Arg Cys Gly Tyr Asp 
225 230 235 240 

ctt ggc teg aac agg teg etc teg gcg ctg etc egg ctg cgc atg gtg 768 
Leu Gly Ser Asn Arg Ser Leu Ser Ala Leu Leu Arg Leu Arg Met Val 
245 250 255 

gtg ctt ggc ggc aac agg cgc gcg gee aag ccg ctg ttc act gcg ata 816 
Val Leu Gly Gly Asn Arg Arg Ala Ala Lys Pro Leu Phe Thr Ala lie 
260 265 270 

cgc ata acg tac gcg eta aac ata ttc gag gcg cac ggg gtc acg ccc 864 
Arg lie Thr Tyr Ala Leu Asn lie Phe Glu Ala His Gly Val Thr Pro 
275 280 285 

ttt eta aag ttc tgc gag agg acc tec aag aaa aag ggc gtc ggc gtg 912 
Phe Leu Lys Phe Cys Glu Arg Thr Ser Lys Lys Lys Gly Val Gly Val 
290 295 300 

gcg gag ctg ttc gaa cag gac egg aac ttt aca ggg gee ate gcg cgc 960 
Ala Glu Leu Phe Glu Gin Asp Arg Asn Phe Thr Gly Ala He Ala Arg 
305 310 315 320 

gca aag gee gcg cag gcg gca ggc atg gag cat ccc aag ata cca aag 1008 
Ala Lys Ala Ala Gin Ala Ala Gly Met Glu His Pro Lys He Pro Lys 
325 330 335 

etc gag gat gee gtc cgc ggg gee egg gga aag gcg ctg gtc ttt acg 1056 
Leu Glu Asp Ala Val Arg Gly Ala Arg Gly Lys Ala Leu Val Phe Thr 
340 345 350 

age tat cgt gat tct gtc gac etc ata cac tea aga etc aag gcg gee 1104 
Ser Tyr Arg Asp Ser Val Asp Leu He His Ser Arg Leu Lys Ala Ala 
355 360 365 

ggg ata aac teg ggc ate ctg ata gga aag gcg gga gaa aag ggc eta 1152 
Gly He Asn Ser Gly He Leu He Gly Lys Ala Gly Glu Lys Gly Leu 
370 375 380 

aag cag aga aaa cag gtg gag act gtg gca aag ttc cgt gac ggc ggg 12 00 

Lys Gin Arg Lys Gin Val Glu Thr Val Ala Lys Phe Arg Asp Gly Gly 
385 390 395 400 

tac gac gtg ctg gta teg acg agg gtc ggc gag gag ggg etc gac ata 1248 
Tyr Asp Val Leu Val Ser Thr Arg Val Gly Glu Glu Gly Leu Asp He 
405 410 415 

teg gag gtc aac ctg gtg ata ttc tat gac aat gtg cca age teg ate 1296 
Ser Glu Val Asn Leu Val lie Phe Tyr Asp Asn Val Pro Ser Ser He 
420 425 430 

agg tac gtg cag agg agg ggg aga aca ggc aga aag gac gec ggc agg 1344 
Arg Tyr Val Gin Arg Arg Gly Arg Thr Gly Arg Lys Asp Ala Gly Arg 
435 440 445 

ctg ata gta ttg atg gca aag ggg acg ata gac gag gca tac tat tgg 1392 
Leu He Val Leu Met Ala Lys Gly Thr He Asp Glu Ala Tyr Tyr Trp 
450 455 460 
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att ggt egg cgc aag atg age gee gee aag ggc atg ggt gag agg atg 1440 
He Gly Arg Arg Lys Met Ser Ala Ala Lys Gly Met Gly Glu Arg Met 
465 470 475 480 

aac egg teg ctg gcg gca ggc ggg get get gee aag gee get cca aag 1488 
Asn Arg Ser Leu Ala Ala Gly Gly Ala Ala Ala Lys Ala Ala Pro Lys 
485 490 495 

gga etc gag ggg tac ttt tag 1509 
Gly Leu Glu Gly Tyr Phe * 
500 

<210> 34 
<211> 502 
<212> PRT 

<213> Cenarchaeum symbiosum 
c400> 34 

Met Glu Thr Gly His lie Thr Gly Arg Tyr He Glu Pro Gly Ala Val 

15 10 15 

Glu Arg Arg Asp Tyr Gin Val Gly Leu Ala Glu Gin Ala He Arg Glu 

20 25 30 

Asn Cys He Val Val Leu Pro Thr Gly Leu Gly Lys Thr Ala Val Ala 

35 40 45 

Leu Gin Val He Ala His Tyr Leu Asp Glu Gly Arg Gly Ala Leu Phe 

50 55 60 

Leu Ala Pro Thr Arg Val Leu Val Asn Gin His Arg Gin Phe Leu Gly 
65 70 75 80 

Arg Ala Leu Thr He Ser Asp He Thr Leu Val Thr Gly Glu Asp Thr 

85 90 95 

He Pro Arg Arg Lys Lys Ala Trp Gly Gly Ser Val He Cys Ala Thr 

100 105 110 

Pro Glu He Ala Arg Asn Asp He Glu Arg Gly Leu Val Pro Leu Glu 

115 120 125 

Gin Phe Gly Leu Val He Phe Asp Glu Ala His Arg Ala Val Gly Asp 

130 135 140 

Tyr Ala Tyr Ser Ser He Ala Arg Ala Val Gly Asp Asn Ser Arg Met 
145 150 155 160 

Val Gly Met Thr Ala Thr Leu Pro Ser Glu Arg Glu Lys Ala Asp Glu 

165 170 175 

He Met Gly Thr Leu Leu Ser Arg Ser He Ala Gin Arg Thr Glu Asp 

180 185 190 

Asp Pro Asp Val Lys Pro Tyr Val Gin Glu Thr Ala Thr Glu Trp lie 

195 200 205 

Lys Val Asp Leu Pro Pro Glu Met Lys Glu He Gin Arg Leu Leu Lys 

210 215 220 

Leu Ala Leu Asp Glu Arg Tyr Ser Ser Leu Lys Arg Cys Gly Tyr Asp 
225 230 235 240 

Leu Gly Ser Asn Arg Ser Leu Ser Ala Leu Leu Arg Leu Arg Met Val 

245 250 255 

Val Leu Gly Gly Asn Arg Arg Ala Ala Lys Pro Leu Phe Thr Ala He 

260 265 270 

Arg He Thr Tyr Ala Leu Asn He Phe Glu Ala His Gly Val Thr Pro 

275 280 285 

Phe Leu Lys Phe Cys Glu Arg Thr Ser Lys Lys Lys Gly Val Gly Val 

290 295 300 

Ala Glu Leu Phe Glu Gin Asp Arg Asn Phe Thr Gly Ala He Ala Arg 
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305 








310 








315 


320 


Ala Lys 


Ala 


Ala 


Gin 


Ala 


Ala 


Gly 


Met 


Glu His 


Pro Lys He Pro Lys 








325 










330 


335 


Leu Glu 


ASD 


Ala 


Val 


Ara 


Glv 


Ala 


Ara 


Glv LVS 


Ala Leu val Phe Thr 






340 










345 




350 


Ser Tyr 


Arq 


Asd 


Ser 


Val 


Asn 


Leu 


He 


His Ser 


Arg Leu Lys Ala Ala 




355 










360 






365 


Gly He 


Asn 


Ser 


Glv 


He 


Leu 


He 


Glv 


Lys Ala 


Gly Glu Lys Gly Leu 


370 










375 








380 


Lvs Gin 


Ara 


Lys 


Gin 


Val 


Glu 


Thr 


Val 


Al a Lys 


Phe Arg Asp Gly Gly 


385 








390 








395 


400 


Tyr Asp 


Val 


Leu 


Val 


Ser 


Thr 


Ara 


Val 


Gly Glu 


Glu Gly Leu Asp He 








405 










410 


415 


Seir Glu 


Val 


Asn 


Leu 


Val 


He 


Phe 


j. yx 


Asn Asn 

J^wkS noli 


Val Pro Ser Ser Tie 






420 










425 




430 


^ yet T^\/ y 

*\rg iyr 


V d X 


Gin 






Gly 




Thr 


Gly Arg 






435 










440 






445 


Leu He 


Val 


Leu 


Met 


Ala 


Lys 


Gly 


Thr 


He Asp 


Glu Ala Tyr Tyr Trp 


450 










455 








460 


He Gly 


Arg 


Axy 


Lys 


Met 


Ser 


Ala 


Ala 


Lys Gly 


Met Glv Glu Ara Me t 


465 
















4 75 


4 80 


Asn Arg 


Ser 


Leu 


Ala 


a j. a 


tiXy 


pi I, 
Csxy 


Aid 


Ala Aid 


Lys Al a Al a Pro Lys 








485 










490 


495 


Gly Leu 


Glu 


Gly 


Tyr 


Phe 
















500 
















<210> 


35 
















<211> 


402 
















<212> 


DNA 

















<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (402) 

<400> 35 

gtg tea teg tac ttt ace ata aag acc gec aac ctg gec ctg ccc gac 4 8 

Met Ser Ser Tyr Phe Thr He Lys Thr Ala Asn Leu Ala Leu Pro Asp 
15 10 15 

gtg gtc aaa aag tac aac cac gtc ctg gca tgc aag age gag gtg atg 96 
Val Val Lys Lys Tyr Asn His Val Leu Ala Cys Lys Ser Glu Val Met 
20 25 30 

agg gec gag aag cag ate cag acg tec ate tec teg tct age ggg etc 14 4 

Arg Ala Glu Lys Gin He Gin Thr Ser He Ser Ser Ser Ser Gly Leu 
35 40 45 

gac aag tac teg gag etc aag caa cag ttc aac tec egg ata acc gag 192 
Asp Lys Tyr Ser Glu Leu Lys Gin Gin Phe Asn Ser Arg He Thr Glu 
50 55 60 

ttc tac cgc teg ata gaa gag ctg gaa aag acc ggt gcg gtg gtc aag 240 
Phe Tyr Arg Ser He Glu Glu Leu Glu Lys Thr Gly Ala Val Val Lys 
65 70 75 80 

age ata gac gag ggc ctg ctg gac ttt ccc gca aag cgc ttt ggg gac 288 
Ser He Asp Glu Gly Leu Leu Asp Phe Pro Ala Lys Arg Phe Gly Asp 
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as 90 95 

gac ate tgg ctg tgc tgg aag aca ggc gag cgc gag ate aag ttc tgg 336 
Asp lie Trp Leu Cys Trp Lys Thr Gly Glu Arg Glu lie Lys Phe Trp 
100 105 110 

cat gaa aag gac tct ggt ttt ggc gga aga aag ccc ata gag gta agt 384 
His Glu Lys Asp Ser Gly Phe Gly Gly Arg Lys Pro lie Glu Val Ser 
115 120 125 

gac gag tea eta gtg tag 402 
Asp Glu Ser Leu Val * 
130 

<210> 36 
<211> 133 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 36 



Met Ser 


Ser 


Tyr 


Phe 


Thr 


He 


Lys 


Thr Ala 


Asn 


Leu 


Ala Leu Pro Asp 


1 






5 








10 






15 


Val Val 


Lys 


Lys 


Tyr 


Asn 


His 


Val 


Leu Ala 


Cys 


Lys 


Ser Glu Val Met 






20 










25 






30 


Arg Ala 


Glu 


Lys 


Gin 


He 


Gin 


Thr 


Ser He 


Ser 


Ser 


Ser Ser Gly Leu 




35 










40 








45 


Asp Lys 


Tyr 


Ser 


Glu 


Leu 


Lys 


Gin 


Gin Phe 


Asn 


Ser 


Arg He Thr Glu 


50 










55 








60 




Phe Tyr 


Arg 


Ser 


He 


Glu 


Glu 


Leu 


Glu Lys 


Thr 


Gly Ala Val Val Lys 


65 








70 








75 




80 


Ser He 


Asp 


Glu 


Gly 


Leu 


Leu 


Asp 


Phe Pro 


Ala 


Lys 


Arg Phe Gly Asp 








85 








90 






95 


Asp He 


Trp 


Leu 


Cys 


Trp 


Lys 


Thr 


Gly Glu 


Arg 


Glu 


He Lys Phe Trp 






100 










105 






110 


His Glu 


Lys 


Asp 


Ser 


Gly Phe 


Gly 


Gly Arg 


Lys 


Pro 


He Glu Val Ser 




115 










120 








125 


Asp Glu 


Ser 


Leu 


Val 

















130 



<210> 37 
<211> 879 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (879) 

<400> 37 

atg etc tec gee tgg ttg cgc gta ata cgc gtc cgc ttc ctg etc gcg 48 
Met Leu Ser Ala Trp Leu Arg Val He Arg Val Arg Phe Leu Leu Ala 
15 10 15 

teg gtg ata gee gtc teg gcg ggc etc gee etc tec tgg tgg cac ggc 96 
Ser Val He Ala Val Ser Ala Gly Leu Ala Leu Ser Trp Trp His Gly 
20 25 30 



cac gaa ata gac gca ttc tec gee gcg etc ace atg gee ggc gtg gec 144 
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His Glu lie Asp Ala Phe Ser Ala Ala Leu Thr Met Ala Gly Val Ala 
35 40 45 

gcg etc cac gca age gtg gac atg etc aac gat tat teg gac tac aag 192 
Ala Leu His Ala Ser Val Asp Met Leu Asn Asp Tyr Ser Asp Tyr Lys 
50 55 €0 



cgc ggc ata gat acc ata acc aag agg acc ccg atg age ggc gga aca 240 
Arg Gly He Asp Thr He Thr Lys Arg Thr Pro Met Ser Gly Gly Thr 
65 70 75 80 

ggg gtg ctg cca gaa ggc ctg ctt acc ccc ggc cag gtg cac cgc gec 288 
Gly Val Leu Pro Glu Gly Leu Leu Thr Pro Gly Gin Val His Arg Ala 
85 90 95 

ggc ate ata teg ctg gtc ctg ggc tct get gtc ggc gcg tac ttt gtg 336 
Gly He He Ser Leu Val Leu Gly Ser Ala Val Gly Ala Tyr Phe Val 
100 105 110 

gtc aca acg ggg ccc gtc ata gec atg ata etc ggc ttt gec gta gtc 384 
Val Thr Thr Gly Pro Val He Ala Met He Leu Gly Phe Ala Val Val 
115 120 125 

teg ata tac ttt tac teg acg agg att gta gac teg ggc etc tec gag 432 
Ser He Tyr Phe Tyr Ser Thr Arg He Val Asp Ser Gly Leu Ser Glu 
130 135 140 

gtc ttt gtg gec gtc aag ggg gcg atg ate gtc ctt ggc gec tac tac 480 
Val Phe Val Ala Val Lys Gly Ala Met He Val Leu Gly Ala Tyr Tyr 
145 150 155 160 

ata cag gcg ccc gag ata acg cct gec gec gtt ctg gtg ggg gcg gee 528 
He Gin Ala Pro Glu He Thr Pro Ala Ala Val Leu Val Gly Ala Ala 
165 170 175 

gtg ggc gee etc teg teg gcg gtc etc ttt gtg gcg teg ttt cca gac 576 
Val Gly Ala Leu Ser Ser Ala Val Leu Phe Val Ala Ser Phe Pro Asp 
180 185 190 

cac gat gcg gac aag tec cgc ggc aga aag acg ctt gtt ata ate ctg 624 
His Asp Ala Asp Lys Ser Arg Gly Arg Lys Thr Leu Val He He Leu 
195 200 205 

ggc aag gag agg gee teg egg ate etc tgg gtg ttc ccc gca gtg gca 672 
Gly Lys Glu Arg Ala Ser Arg He Leu Trp Val Phe Pro Ala Val Ala 
210 215 220 

tac teg tec gtt ata acg ggg gtc ate ctg cag ttc ctg ccg gtg cat 720 
Tyr Ser Ser val He Thr Gly Val He Leu Gin Phe Leu Pro Val His 
225 230 235 240 

gca eta acc atg ctg ctt gca gec ccc ctt gca gta att gcg gca aaa 768 
Ala Leu Thr Met Leu Leu Ala Ala Pro Leu Ala Val He Ala Ala Lys 
245 250 255 



ggc ctt gee agg gag tac ggc ggg gac ggg ate ata egg gtc atg cgc 
Gly Leu Ala Arg Glu Tyr Gly Gly Asp Gly He He Arg Val Met Arg 
260 265 270 
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ggc acg ctg egg ttt age agg gtt gca ggc gec ctg ctg gtg ttg ggc 864 
Gly Thr Leu Arg Phe Ser Arg Val Ala Gly Ala Leu Leu Val Leu Gly 
275 280 285 

att ctg ttg ggc tga 879 
lie Leu Leu Gly * 
290 

<210> 38 

<2X1> 292 

<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 38 


















Met Leu 


Ser Ala Trp 


Leu Arg Val 


He 


Arg 


Val 


Arg 


Phe 


Leu 


Leu Ala 


1 


5 






10 










15 


Ser Val 


He Ala Val 


Ser Ala Gly Leu Ala 


Leu 


Ser 


Trp Trp His Gly 




20 




25 










30 




His Glu 


He Asp Ala 


Phe Ser Ala 


Ala 


Leu 


Thr 


Met 


Ala 


Gly Val Ala 




35 


40 










45 






Ala Leu 


His Ala Ser 


Val Asp Met 


Leu 


Asn 


Asp 


Tyr Ser Asp 


Tyr Lys 


50 




55 








60 








Arg Gly 


He Asp Thr 


He Thr Lys 


Arg Thr 


Pro 


Met 


Ser Gly Gly Thr 


65 




70 






75 








80 


Gly Val 


Leu Pro Glu 


Gly Leu Leu Thr Pro Gly Gin Val 


His 


Arg Ala 




85 






90 










95 


Gly He 


He Ser Leu 


Val Leu Gly 


Ser 


Ala 


Val 


Gly Ala 


Tyr 


Phe Val 




100 




105 










110 




Val Thr 


Thr Gly Pro 


Val He Ala 


Met 


He 


Leu 


Gly Phe Ala Val Val 




115 


120 










125 






Ser He 


Tyr Phe Tyr 


Ser Thr Arg 


He 


Val 


Asp 


Ser 


Gly Leu 


Ser Glu 


130 




135 








140 








Val Phe 


Val Ala Val 


Lys Gly Ala 


Met 


He 


Val 


Leu 


Gly Ala Tyr Tyr 


145 




150 






155 








160 


He Gin Ala Pro Glu 


He Thr Pro 


Ala 


Ala 


Val 


Leu 


Val 


Gly Ala Ala 




165 






170 










175 


Val Gly Ala Leu Ser 


Ser Ala Val 


Leu 


Phe 


Val 


Ala 


Ser 


Phe 


Pro Asp 




180 




185 










190 




His Asp Ala Asp Lys 


Ser Arg Gly Arg Lys Thr 


Leu 


Val 


He 


He Leu 




195 


200 










205 






Gly Lys 


Glu Arg Ala 


Ser Arg He 


Leu Trp Val 


Phe 


Pro 


Ala 


Val Ala 


210 




215 








220 








Tyr Ser 


Ser Val He 


Thr Gly Val 


He 


Leu 


Gin 


Phe 


Leu 


Pro 


Val His 


225 




230 






235 








240 


Ala Leu 


Thr Met Leu 


Leu Ala Ala 


Pro 


Leu 


Ala 


Val 


He 


Ala 


Ala Lys 




245 






250 










255 


Gly Leu Ala Arg Glu 


Tyr Gly Gly Asp Gly 


He 


He 


Arg 


Val 


Met Arg 




260 




265 










270 




Gly Thr 


Leu Arg Phe 


Ser Arg Val 


Ala 


Gly Ala 


Leu 


Leu 


Val 


Leu Gly 




275 


280 










285 






He Leu 


Leu Gly 



















290 



<210> 39 
<211> 1119 
<212> DNA 

<213> Cenarchaeum symbiosum 
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<220> 

<221> CDS 

<222> (1) . . . (1119) 

<400> 39 

atg ate age ggg cac gec acg gec gag ggt aca cgc agg ata gec gag 4 8 

Met He Ser Gly His Ala Thr Ala Glu Gly Thr Arg Arg He Ala Glu 
15 10 15 

atg teg ggc gec cat ate gac aac tac aag atg gtc gac ggg ctg cac 96 
Met Ser Gly Ala His He Asp Asn Tyr Lys Met Val Asp Gly Leu His 
20 25 30 

etc tec aac gtg ggg atg ggc acc tac ctt ggc gac gcg gat gac gec 144 
Leu Ser Asn Val Gly Met Gly Thr Tyr Leu Gly Asp Ala Asp Asp Ala 
35 40 45 

acc gac agg gec gtc acg gac gca gtc aag agg tec gtc aaa aca ggc 192 
Thr Asp Arg Ala Val Thr Asp Ala Val Lys Arg Ser Val Lys Thr Gly 
50 55 60 

ata aac gtc ata gat acg gcg ata aac tac cgc etc cag agg gec gag 240 
He Asn Val He Asp Thr Ala He Asn Tyr Arg Leu Gin Arg Ala Glu 
65 70 75 80 

cgc tct gtc ggc agg gec gtc acg gag etc tea gaa gag ggg etc gta 288 
Arg Ser Val Gly Arg Ala Val Thr Glu Leu Ser Glu Glu Gly Leu Val 
85 90 95 

tea agg gac caa ata ttc ata teg aca aag gcg ggc tat gta aca aac 336 
Ser Arg Asp Gin He Phe He Ser Thr Lys Ala Gly Tyr Val Thr Asn 
100 105 no 

gac tec gag gtc teg ctt gac ttt tgg gag tat gtg aaa aaa gag tac 384 
Asp Ser Glu Val Ser Leu Asp Phe Trp Glu Tyr Val Lys Lys Glu Tyr 
115 120 125 

gtc ggg ggc ggc gtg ate cag gca ggc gac ata tec tec gga tac cac 432 
Val Gly Gly Gly Val He Gin Ala Gly Asp He Ser Ser Gly Tyr His 
130 135 140 

tgc atg aag ccc gec tat eta gag gac cag ctg aag agg age ctt gca 480 
Cys Met Lys Pro Ala Tyr Leu Glu Asp Gin Leu Lys Arg Ser Leu Ala 
145 150 155 160 

aac atg ggc etc gac tgt ate gac ctt gtc tac gtg cac aac ccc gtc 528 
Asn Met Gly Leu Asp Cys He Asp Leu Val Tyr Val His Asn Pro Val 
165 170 175 

gag ggg cag ate aag gac cgc ccc ata ccg gag ate etc gac tgt ata 576 
Glu Gly Gin He Lys Asp Arg Pro He Pro Glu lie Leu Asp Cys He 
180 185 190 

gga gag gec ttt gec atg tac gag aag gca agg gag gat ggc cgc ate 624 
Gly Glu Ala Phe Ala Met Tyr Glu Lys Ala Arg Glu Asp Gly Arg He 
195 200 205 
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aga tac tat ggg etc gec acg tgg gag tgc ttt cgt gtt gca ggg gac 672 
Arg Tyr Tyr Gly Leu Ala Thr Trp Glu Cys Phe Arg Val Ala Gly Asp 
210 215 220 

aac ccg cag aat gtc cag etc gaa gac gtt gta aag aag gec aaa gac 720 
Asn Pro Gin Asn Val Gin Leu Glu Asp Val Val Lys Lys Ala Lys Asp 
225 230 235 240 

gca ggc ggg gac aac cac gga ttc aag ttc ata cag ctg ccc ttc aac 768 
Ala Gly Gly Asp Asn His Gly Phe Lys Phe lie Gin Leu Pro Phe Asn 
245 250 255 

cag tac ttt gac cag get tac atg eta aag aac cag acg gtg gac ggc 816 
Gin Tyr Phe Asp Gin Ala Tyr Met Leu Lys Asn Gin Thr Val Asp Gly 
260 265 270 

aga aag ctg tec ata ctg gat gcg gca gta tec ctt ggc gtc ggt gtg 864 
Arg Lys Leu Ser lie Leu Asp Ala Ala Val Ser Leu Gly Val Gly Val 
275 280 285 

ttc acg agt gtc ccg ttc atg caa ggc aag ctg etc gag cct ggc ctg 912 
Phe Thr Ser Val Pro Phe Met Gin Gly Lys Leu Leu Glu Pro Gly Leu 
290 295 300 

ctg ccg gag ttt ggc ggg etc tec ccc gec ctg cga tec ctg cag ttt 960 
Leu Pro Glu Phe Gly Gly Leu Ser Pro Ala Leu Arg Ser Leu Gin Phe 
305 310 315 320 

ate agg tct aca cca ggc gtg ctt gec ccc ctg ccg ggg cac aac tea 1008 
lie Arg Ser Thr Pro Gly Val Leu Ala Pro Leu Pro Gly His Asn Ser 
325 330 335 

get gcg cat aca gac gag aac etc aag ate atg ggc gtg ccc ccc ate 1056 
Ala Ala His Thr Asp Glu Asn Leu Lys lie Met Gly Val Pro Pro lie 
340 345 350 

ccg cct gac aag ttc ggg gag ctt gtg gec age etc ace teg tgg teg 1104 
Pro Pro Asp Lys Phe Gly Glu Leu Val Ala Ser Leu Thr Ser Trp Ser 
355 360 365 

ccc ggt cag aaa tag 1119 
Pro Gly Gin Lys * 
370 

<210> 40 
<211> 372 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 40 

Met lie Ser Gly His Ala Thr Ala Glu Gly Thr Arg Arg He Ala Glu 

15 10 15 

Met Ser Gly Ala His He Asp Asn Tyr Lys Met Val Asp Gly Leu His 

20 25 30 

Leu Ser Asn Val Gly Met Gly Thr Tyr Leu Gly Asp Ala Asp Asp Ala 

35 40 45 

Thr Asp Arg Ala Val Thr Asp Ala Val Lys Arg Ser Val Lys Thr Gly 
50 55 60 
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Ile Asn Val He Asp Thr Ala He Asn Tyr Arg Leu Gin Arg Ala Glu 
65 70 75 80 

Arg Ser Val Gly Arg Ala Val Thr Glu Leu Ser Glu Glu Gly Leu Val 

85 90 95 

Ser Arg Asp Gin He Phe He Ser Thr Lys Ala Gly Tyr Val Thr Asn 

100 105 110 

Asp Ser Glu Val Ser Leu Asp Phe Trp Glu Tyr Val Lys Lys Glu Tyr 

115 120 125 

Val Gly Gly Gly Val He Gin Ala Gly Asp He Ser Ser Gly Tyr His 

130 135 140 

Cys Met Lys Pro Ala Tyr Leu Glu Asp Gin Leu Lys Arg Ser Leu Ala 
145 150 155 160 

Asn Met Gly Leu Asp Cys He Asp Leu Val Tyr Val His Asn Pro Val 

165 170 175 

Glu Gly Gin He Lys Asp Arg Pro He Pro Glu He Leu Asp Cys He 

180 185 190 

Gly Glu Ala Phe Ala Met Tyr Glu Lys Ala Arg Glu Asp Gly Arg He 

195 200 205 

Arg Tyr Tyr Gly Leu Ala Thr Trp Glu Cys Phe Arg Val Ala Gly Asp 

210 215 220 

Asn Pro Gin Asn Val Gin Leu Glu Asp Val Val Lys Lys Ala Lys Asp 
225 230 235 240 

Ala Gly Gly Asp Asn His Gly Phe Lys Phe He Gin Leu Pro Phe Asn 

245 250 255 

Gin Tyr Phe Asp Gin Ala Tyr Met Leu Lys Asn Gin Thr Val Asp Gly 

260 265 270 

Arg Lys Leu Ser He Leu Asp Ala Ala Val Ser Leu Gly Val Gly Val 

275 280 285 

Phe Thr Ser Val Pro Phe Met Gin Gly Lys Leu Leu Glu Pro Gly Leu 

290 295 300 

Leu Pro Glu Phe Gly Gly Leu Ser Pro Ala Leu Arg Ser Leu Gin Phe 
305 310 315 320 

He Arg Ser Thr Pro Gly Val Leu Ala Pro Leu Pro Gly His Asn Ser 

325 330 335 

Ala Ala His Thr Asp Glu Asn Leu Lys He Met Gly Val Pro Pro He 

340 345 350 

Pro Pro Asp Lys Phe Gly Glu Leu Val Ala Ser Leu Thr Ser Trp Ser 

355 360 365 

Pro Gly Gin Lys 
370 



<210> 41 
<211> 1107 
<212> DNA 

<213> Cenarchaeum symbiosum 



<220> 
<221> CDS 

<222> (1) . . . (1107) 



<400> 41 

atg gca egg ggg cct ate ttg agt gaa aag ttc cag ata ctg cag ggc 48 

Met Ala Arg Gly Pro He Leu Ser Glu Lys Phe Gin He Leu Gin Gly 
15 10 15 



gac gec egg gag gtg ctg ccg egg ctg gca aag aat aca gec gag cgc 
Asp Ala Arg Glu Val Leu Pro Arg Leu Ala Lys Asn Thr Ala Glu Arg 
20 25 30 
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ggc agg tac aga ctg gcg gta aca tec cct ccc tat tac ggg cac aga 144 
Gly Arg Tyr Arg Leu Ala Val Thr Ser Pro Pro Tyr Tyr Gly His Arg 
35 40 45 

aag tac ggg teg gag ccc tec gag ctg ggc cag gaa aag acg cca gac 192 
Lys Tyr Gly Ser Glu Pro Ser Glu Leu Gly Gin Glu Lys Thr Pro Asp 
50 55 60 

gag ttc ate gag gag ctg gca gga gta ttc aag age tgc atg gac ctg 240 
Glu Phe lie Glu Glu Leu Ala Gly Val Phe Lys Ser Cys Met Asp Leu 
65 70 75 80 

eta aca gac gac ggg age etc ttc ata gtg ata ggt gat ace agg agg 288 
Leu Thr Asp Asp Gly Ser Leu Phe lie Val lie Gly Asp Thr Arg Arg 
85 90 95 

c 99 cgc cac aag ctg atg gtc ccg cac egg etc gcg eta agg ctg gtg 336 
Arg Arg His Lys Leu Met Val Pro His Arg Leu Ala Leu Arg Leu Val 
100 105 110 

gat ctt ggg tac cat ttc cag gag gat ata ate tgg tac aag cga aac 384 
Asp Leu Gly Tyr His Phe Gin Glu Asp lie lie Trp Tyr Lys Arg Asn 
115 120 125 

gee ate teg caa age teg egg caa aac ctg acg cag gcg tac gag ttt 432 
Ala lie Ser Gin Ser Ser Arg Gin Asn Leu Thr Gin Ala Tyr Glu Phe 
130 135 140 

gtt ctg gtc etc tea aag teg gat acc ccc gee tat gac ata aac ccg 480 
Val Leu Val Leu Ser Lys Ser Asp Thr Pro Ala Tyr Asp lie Asn Pro 
145 150 155 160 

ata cgc gtc cag ggc aac gag gee ctg age ggg ata aac age aaa ccc 528 
lie Arg Val Gin Gly Asn Glu Ala Leu Ser Gly lie Asn Ser Lys Pro 
165 170 175 

gca aat gac egg ctg cag ttc gee ccc ggg aag agg gat ccc gag gca 576 
Ala Asn Asp Arg Leu Gin Phe Ala Pro Gly Lys Arg Asp Pro Glu Ala 
180 185 190 

ata ggg agg att gca gee gtg ata cac ggc tea acg cct ggt acg ccg 624 
He Gly Arg He Ala Ala Val He His Gly Ser Thr Pro Gly Thr Pro 
195 200 205 

ttt gac gag ctg cca acc acc ggg gaa ata tea tgg gec cac ggc tat 672 
Phe Asp Glu Leu Pro Thr Thr Gly Glu He Ser Trp /Via His Gly Tyr 
210 215 220 

gac ccc gaa aag tac tgc ccc acg tgc tat cgc aag ttc egg agg cat 720 
Asp Pro Glu Lys Tyr Cys Pro Thr Cys Tyr Arg Lys Phe Arg Arg His 
225 230 235 240 

gcg acg cgc aag agg ata ggg ggc cac gag cac tat ccg ata ttt gee 768 
Ala Thr Arg Lys Arg He Gly Gly His Glu His Tyr Pro He Phe Ala 
245 250 255 



gca tgc aac ccg egg ggc aag aac ccg ggg aac gtc tgg gag ata tec 
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Ala Cys Asn Pro Arg Gly Lys Asn Pro Gly Asn Val Trp Glu lie Ser 
260 265 270 

aca aag gcg cac cat gga aac gag cac ttt gcg gta ttc cca gaa gac 864 
Thr Lys Ala His His Gly Asn Glu His Phe Ala Val Phe Pro Glu Asp 
275 280 285 

ctt gta tec agg ata gta aag ttt gec aca aaa gag ggc gat tac gtg 912 
Leu Val Ser Arg lie Val Lys Phe Ala Thr Lys Glu Gly Asp Tyr Val 
290 295 300 

ctg gac ccg ttt gca ggc agg ggg acc acg gga ata gtc tct gca tgc 960 
Leu Asp Pro Phe Ala Gly Arg Gly Thr Thr Gly lie Val Ser Ala Cys 
305 310 315 320 

etc aag agg ggc ttt acc ggg ata gac ctg tat cct gec aac gtg gca 1008 
Leu Lys Arg Gly Phe Thr Gly He Asp Leu Tyr Pro Ala Asn Val Ala 
325 330 335 

agg gec egg cgc aac gtg cag gat tec gec gat tea egg etc tea aaa 1056 
Arg Ala Arg Arg Asn Val Gin Asp Ser Ala Asp Ser Arg Leu Ser Lys 
340 345 350 

aag gtg etc gac cag ata atg ccc gag agg cag ctg acc ggc tat ttc 1104 
Lys Val Leu Asp Gin He Met Pro Glu Arg Gin Leu Thr Gly Tyr Phe 
355 360 365 

tga 1107 



<210> 42 
<211> 368 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 42 

Met Ala Arg Gly Pro He Leu Ser Glu Lys Phe Gin He Leu Gin Gly 

15 10 15 

Asp Ala Arg Glu Val Leu Pro Arg Leu Ala Lys Asn Thr Ala Glu Arg 

20 25 30 

Gly Arg Tyr Arg Leu Ala Val Thr Ser Pro Pro Tyr Tyr Gly His Arg 

35 40 45 

Lys Tyr Gly Ser Glu Pro Ser Glu Leu Gly Gin Glu Lys Thr Pro Asp 

50 55 60 

Glu Phe He Glu Glu Leu Ala Gly Val Phe Lys Ser Cys Met Asp Leu 
65 70 75 80 

Leu Thr Asp Asp Gly Ser Leu Phe He Val He Gly Asp Thr Arg Arg 

85 90 95 

Arg Arg His Lys Leu Met val Pro His Arg Leu Ala Leu Arg Leu Val 

100 105 110 

Asp Leu Gly Tyr His Phe Gin Glu Asp He He Trp Tyr Lys Arg Asn 

115 120 125 

Ala He Ser Gin Ser Ser Arg Gin Asn Leu Thr Gin Ala Tyr Glu Phe 

130 135 140 

Val Leu Val Leu Ser Lys Ser Asp Thr Pro Ala Tyr Asp He Asn Pro 
145 150 155 160 

He Arg Val Gin Gly Asn Glu Ala Leu Ser Gly lie Asn Ser Lys Pro 
165 170 175 
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Ala 


Asn Asp Arg 


Leu 


Gin Phe Ala 


Pro Gly Lys 


Arg 


Asp Pro Glu Ala 








180 






185 






190 


lie Gly Arg 


He 


Ala 


Ala Val He 


His 


Gly Ser Thr Pro Gly Thr Pro 






195 






200 








205 


Phe 


Asp 


Glu 


Leu 


Pro 


Thr Thr Gly 


Glu 


He Ser Trp Ala His Gly Tyr 




210 








215 






220 




Asp 


Pro 


Glu 


Lys 


Tyr 


Cys Pro Thr 


Cys 


Tyr Arg 


Lys 


Phe Arg Arg His 


225 










230 




235 




240 


Ala 


Thr 


Arg 


Lys 


Arg 


He Gly Gly 


His 


Glu His 


Tyr 


Pro He Phe Ala 










245 






250 




255 


Ala 


Cys 


Asn 


Pro 


Arg 


Gly Lys Asn 


Pro Gly Asn Val Trp Glu He Ser 








260 






265 






270 


Thr 


Lys 


Ala 


His 


His 


Gly Asn Glu 


His 


Phe Ala 


Val 


Phe Pro Glu Asp 






275 






280 








285 


Leu 


Val 


Ser 


Arg 


He 


Val Lys Phe 


Ala 


Thr Lys 


Glu 


Gly Asp Tyr Val 




290 








295 






300 




Leu Asp 


Pro 


Phe 


Ala 


Gly Arg Gly 


Thr 


Thr Gly 


He 


Val Ser Ala Cys 


305 










310 




315 




320 


Leu 


Lys 


Arg 


Gly 


Phe 


Thr Gly He 


Asp 


Leu Tyr 


Pro 


Ala Asn Val Ala 










325 






330 




335 


Arg Ala Arg Arg 


Asn 


Val Gin Asp 


Ser 


Ala Asp 


Ser 


Arg Leu Ser Lys 








340 






345 






350 


Lys 


Val 


Leu Asp 


Gin 


He Met Pro 


Glu Arg Gin 


Leu 


Thr Gly Tyr Phe 






355 






360 








365 



<210> 43 

<211> 933 

<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (933) 

<400> 43 

atg cct agt tac gca gaa ata gca aac gac gta ctt cga eta atg gag 48 
Met Pro Ser Tyr Ala Glu He Ala Asn Asp Val Leu Arg Leu Met Glu 
15 10 15 

tea gtc ggt gag cag gca cct ggt gta gta ctt cac gac tat ctt tea 96 
Ser Val Gly Glu Gin Ala Pro Gly Val Val Leu His Asp Tyr Leu Ser 
20 25 30 

aaa ttg caa cag tat teg ggg agg gat aca ata ctg tat gcg acc aac 144 
Lys Leu Gin Gin Tyr Ser Gly Arg Asp Thr He Leu Tyr Ala Thr Asn 
35 40 45 

tgg ata acg gac gaa gcg cat acg tct aat gaa get etc ata aca aat 192 
Trp He Thr Asp Glu Ala His Thr Ser Asn Glu Ala Leu He Thr Asn 
50 55 60 

ggt gac ctg tat gga ttt atg agg atg atg cgt gat tta aag act aag 240 
Gly Asp Leu Tyr Gly Phe Met Arg Met Met Arg Asp Leu Lys Thr Lys 
65 70 75 80 



aaa tta gat tta ata 
Lys Leu Asp Leu He 
85 



etc cac 
Leu His 



agt ccg ggg ggc tec gtc gag tec acc 
Ser Pro Gly Gly Ser Val Glu Ser Thr 
90 95 



288 
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gaa gca ate gtc tea tac ata cgt gca aaa ttt aaa aat gtc egg ate 336 
Glu Ala lie Val Ser Tyr lie Arg Ala Lys Phe Lys Asn Val Arg lie 
100 105 110 

att ate cca tat gee gcg atg teg gca get gcg atg ctt gca tgc tea 384 
lie lie Pro Tyr Ala Ala Met Ser Ala Ala Ala Met Leu Ala Cys Ser 
115 120 125 

teg aat tgc ctg gta atg ggt aaa cac tea teg ata ggt ccc acc gac 432 
Ser Asn Cys Leu Val Met Gly Lys His Ser Ser He Gly Pro Thr Asp 
130 135 140 

ccc caa ttt att att cca acc agg acc ggc atg cac ata atg tct gca 480 
Pro Gin Phe He He Pro Thr Arg Thr Gly Met His He Met Ser Ala 
145 150 155 160 

cag ttt eta att age gag ttt caa gaa gca cag teg gtg tea gaa aaa 528 
Gin Phe Leu He Ser Glu Phe Gin Glu Ala Gin Ser Val Ser Glu Lys 
165 170 175 

cac ccg ggg agg etc ggc gca tgg ctt cca ctg tta ggg caa tat cct 576 
His Pro Gly Arg Leu Gly Ala Trp Leu Pro Leu Leu Gly Gin Tyr Pro 
180 185 190 

ccc ggg eta att caa aaa tgc att age age cag aag eta agt gtg gaa 624 
Pro Gly Leu lie Gin Lys Cys He Ser Ser Gin Lys Leu Ser Val Glu 
195 200 205 

ctt gta caa aaa tgg ctg get aga tac atg ttt gag aac gag tct gca 672 
Leu Val Gin Lys Trp Leu Ala Arg Tyr Met Phe Glu Asn Glu Ser Ala 
210 215 220 

gcg gta aaa aag tea aaa aaa ata tea gaa ata atg tct tec tct aaa 72 0 

Ala Val Lys Lys Ser Lys Lys He Ser Glu He Met Ser Ser Ser Lys 
225 230 235 240 

aaa tat cac agt cat gga agg egc ata teg aga gaa gaa tgt aaa agg 768 
Lys Tyr His Ser His Gly Arg Arg He Ser Arg Glu Glu Cys Lys Arg 
245 250 255 

att ggc tta aaa gta act gat ctg gaa gat gaa caa gaa ttt caa gat 816 
He Gly Leu Lys Val Thr Asp Leu Glu Asp Glu Gin Glu Phe Gin Asp 
260 265 270 

ctg gtg ctg tea gta ttt cat gcg gca aat acc atg ttt cag tat act 864 
Leu Val Leu Ser Val Phe His Ala Ala Asn Thr Met Phe Gin Tyr Thr 
275 280 285 

cca gtc aac aaa att ate atg aat cac etc ggt aat acc gtc gtt gag 912 
Pro Val Asn Lys He lie Met Asn His Leu Gly Asn Thr Val Val Glu 
290 295 300 

aca ctg cca aca cca egg taa 933 
Thr Leu Pro Thr Pro Arg * 
305 310 



<210> 44 
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<212> PRT 
<213> Cenarchaeum 
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symbiosum 



<400> 44 

Met Pro Ser Tyr Ala Glu lie Ala Asn Asp Val Leu Arg Leu Met Glu 

15 10 15 

Ser Val Gly Glu Gin Ala Pro Gly Val Val Leu His Asp Tyr Leu Ser 

20 25 30 

Lys Leu Gin Gin Tyr Ser Gly Arg Asp Thr lie Leu Tyr Ala Thr Asn 

35 40 45 

Trp lie Thr Asp Glu Ala His Thr Ser Asn Glu Ala Leu lie Thr Asn 

50 55 60 

Gly Asp Leu Tyr Gly Phe Met Arg Met Met Arg Asp Leu Lys Thr Lys 
65 70 75 80 

Lys Leu Asp Leu lie Leu His Ser Pro Gly Gly Ser Val Glu Ser Thr 

, 85 90 95 

Glu Ala lie Val Ser Tyr lie Arg Ala Lys Phe Lys Asn Val Arg lie 

100 105 110 

lie lie Pro Tyr Ala Ala Met Ser Ala Ala Ala Met Leu Ala Cys Ser 

115 120 125 

Ser Asn Cys Leu Val Met Gly Lys His Ser Ser lie Gly Pro Thr Asp 

130 135 140 

Pro Gin Phe lie He Pro Thr Arg Thr Gly Met His He Met Ser Ala 
145 150 155 160 

Gin Phe Leu He Ser Glu Phe Gin Glu Ala Gin Ser Val Ser Glu Lys 

165 170 175 

His Pro Gly Arg Leu Gly Ala Trp Leu Pro Leu Leu Gly Gin Tyr Pro 

180 185 190 

Pro Gly Leu lie Gin Lys Cys He Ser Ser Gin Lys Leu Ser Val Glu 

195 200 205 

Leu Val Gin Lys Trp Leu Ala Arg Tyr Met Phe Glu Asn Glu Ser Ala 

210 215 220 

Ala Val Lys Lys Ser Lys Lys He Ser Glu He Met Ser Ser Ser Lys 
225 230 235 240 

Lys Tyr His Ser His Gly Arg Arg He Ser Arg Glu Glu Cys Lys Arg 

245 250 255 

He Gly Leu Lys Val Thr Asp Leu Glu Asp Glu Gin Glu Phe Gin Asp 

260 265 270 

Leu Val Leu Ser Val Phe His Ala Ala Asn Thr Met Phe Gin Tyr Thr 

275 280 285 

Pro Val Asn Lys He He Met Asn His Leu Gly Asn Thr Val Val Glu 

290 295 300 

Thr Leu Pro Thr Pro Arg 
305 310 



<210> 45 
<211> 1305 
<212> DNA 

<213> Cenarchaeum symbiosum 



<220> 

<221> CDS 

<222> (1) . . . (1305) 



<400> 45 

gtg gat ctg gaa cgc gag tac agg gca aag acc ggc ggc teg gec egg 
Met Asp Leu Glu Arg Glu Tyr Arg Ala Lys Thr Gly Gly Ser Ala Arg 



48 
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15 10 15 

ate ttt gec agg teg aaa aag tac cac gtc ggc ggg gtc age cac aac 96 
lie Phe Ala Arg Ser Lys Lys Tyr His Val Gly Gly Val Ser His Asn 
20 25 30 

ata agg ttc tac gag ccg tat ccg ttt gtg aca agg tec gcg age ggc 144 
lie Arg Phe Tyr Glu Pro Tyr Pro Phe Val Thr Arg Ser Ala Ser Gly 
35 40 45 

aag cac etc gtc gac gtg gac ggg aac aag tat gta gac tac tgg atg 192 
Lys His Leu Val Asp Val Asp Gly Asn Lys Tyr Val Asp Tyr Trp Met 
50 55 60 

ggg cac tgg age ctg ata ctg ggg cac gcg ccg gcg cca gtc agg teg 240 
Gly His Trp Ser Leu He Leu Gly His Ala Pro Ala Pro Val Arg Ser 
65 70 75 80 

gca gta gag ggg cag ctt cgc cgc ggc tgg ate cac ggg ace gtc aac 288 
Ala Val Glu Gly Gin Leu Arg Arg Gly Trp He His Gly Thr Val Asn 
85 90 95 

gag cag acg atg aat etc teg gag ata ata cgc ggc gcg gta age gtg 336 
Glu Gin Thr Met Asn Leu Ser Glu He He Arg Gly Ala Val Ser Val 
100 105 110 

gca gaa aag aca agg tac gtc acg teg ggg acg gag gee gtc atg tat 384 
Ala Glu Lys Thr Arg Tyr Val Thr Ser Gly Thr Glu Ala Val Met Tyr 
115 120 125 

gcg gca agg ctg gcg cgc gcg cat acg ggc aga aaa ata ata gca aag 432 
Ala Ala Arg Leu Ala Arg Ala His Thr Gly Arg Lys He He Ala Lys 
130 135 140 

gcg gac ggc ggc tgg cac ggg tac gcg teg ggg ctg etc aag teg gtc 480 
Ala Asp Gly Gly Trp His Gly Tyr Ala Ser Gly Leu Leu Lys Ser Val 
145 150 155 160 

aac tgg ccg tat gat gtg ccc gag age ggg ggg etc gtc gac gaa gag 528 
Asn Trp Pro Tyr Asp Val Pro Glu Ser Gly Gly Leu Val Asp Glu Glu 
165 170 175 

cac tct ata tec att ccg tac aac gat ctt gaa ggt tec ctg gat gtt 576 
His Ser He Ser He Pro Tyr Asn Asp Leu Glu Gly Ser Leu Asp Val 
180 165 190 

ctt ggg cgc gca ggc gac gac ttg gca tgc gtg ata ate gag ccg ctg 624 
Leu Gly Arg Ala Gly Asp Asp Leu Ala Cys Val He He Glu Pro Leu 
195 200 205 

ctg ggc ggc ggc ggc tgc ata ccg gcg gat gag gac tat ctg cgc ggc 672 
Leu Gly Gly Gly Gly Cys He Pro Ala Asp Glu Asp Tyr Leu Arg Gly 
210 215 220 



ata cag gag ttt gtg cat tea agg ggc gcg ctg ctt gtc etc gac gag 
He Gin Glu Phe Val His Ser Arg Gly Ala Leu Leu Val Leu Asp Glu 
225 230 235 240 
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ata gtg aca ggg ttc egg ttt agg ttt ggc tgc gcg tat get gca gca 768 
lie Val Thr Gly Phe Arg Phe Arg Phe Gly Cys Ala Tyr Ala Ala Ala 
245 250 255 

ggg ctg gac ccc gat ata gtg gcg etc ggc aag ata gtc ggg ggc gga 816 
Gly Leu Asp Pro Asp lie Val Ala Leu Gly Lys lie Val Gly Gly Gly 
260 265 270 

ttc ccc ata ggg gtg ata tgc ggc aag gac gag gtg atg gaa ate tec 864 
Phe Pro lie Gly Val lie Cys Gly Lys Asp Glu Val Met Glu lie Ser 
275 280 285 

aac act ata teg cat gca aag tec gac agg gcg tac ate ggc ggc ggc 912 
Asn Thr lie Ser His Ala Lys Ser Asp Arg Ala Tyr lie Gly Gly Gly 
290 295 300 

aca ttc tct gca aac ccc gec acg atg aca gcg ggc gcg gca gcg etc 960 
Thr Phe Ser Ala Asn Pro Ala Thr Met Thr Ala Gly Ala Ala Ala Leu 
305 310 315 320 

ggg gag etc aaa aag aga aag ggc aca ata tac ccg agg ata aac tec 1008 
Gly Glu Leu Lys Lys Arg Lys Gly Thr lie Tyr Pro Arg lie Asn Ser 
325 330 335 

atg ggg gac gac gca agg gac aag etc tea aag ata ttt ggg aac agg 1056 
Met Gly Asp Asp Ala Arg Asp Lys Leu Ser Lys lie Phe Gly Asn Arg 
340 345 350 

gta tec gtg ace gga agg ggc teg ctg ttc atg act cac ttt gtt caa 1104 
Val Ser Val Thr Gly Arg Gly Ser Leu Phe Met Thr His Phe Val Gin 
355 360 365 

gat ggc gec ggc agg gtc tea aat get gca gat gcg gca gee tgc gat 1152 
Asp Gly Ala Gly Arg Val Ser Asn Ala Ala Asp Ala Ala Ala Cys Asp 
370 375 380 

gtt gag ctg ctg cac agg tac cac ctg gac atg ate ace egg gac ggc 12 00 

Val Glu Leu Leu His Arg Tyr His Leu Asp Met lie Thr Arg Asp Gly 
385 390 395 400 

ata ttc ttt ctg ccg ggc aag ctg ggg gee ata teg gcg gcg cac tea 124 8 

lie Phe Phe Leu Pro Gly Lys Leu Gly Ala lie Ser Ala Ala His Ser 
405 410 415 

aag gee gac etc aag acc atg tat tec gca tea gag cgc ttt gca gaa 1296 
Lys Ala Asp Leu Lys Thr Met Tyr Ser Ala Ser Glu Arg Phe Ala Glu 
420 425 430 

ggc eta tga 1305 
Gly Leu * 

<210> 46 
<211> 434 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 46 

Met Asp Leu Glu Arg Glu Tyr Arg Ala Lys Thr Gly Gly Ser Ala Arg 
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15 10 15 

lie Phe Ala Arg Ser Lys Lys Tyr His Val Gly Gly Val Ser His Asn 

20 25 30 

He Arg Phe Tyr Glu Pro Tyr Pro Phe Val Thr Arg Ser Ala Ser Gly 

35 40 45 

Lys His Leu Val Asp Val Asp Gly Asn Lys Tyr Val Asp Tyr Trp Met 

50 55 60 

Gly His Trp Ser Leu He Leu Gly His Ala Pro Ala Pro Val Arg Ser 
65 70 75 80 

Ala Val Glu Gly Gin Leu Arg Arg Gly Trp He His Gly Thr Val Asn 

85 90 95 

Glu Gin Thr Met Asn Leu Ser Glu He He Arg Gly Ala Val Ser Val 

100 105 110 

Ala Glu Lys Thr Arg Tyr Val Thr Ser Gly Thr Glu Ala Val Met Tyr 

115 120 125 

Ala Ala Arg Leu Ala Arg Ala His Thr Gly Arg Lys lie He Ala Lys 

130 135 140 

Ala Asp Gly Gly Trp His Gly Tyr Ala Ser Gly Leu Leu Lys Ser Val 
145 150 155 160 

Asn Trp Pro Tyr Asp Val Pro Glu Ser Gly Gly Leu Val Asp Glu Glu 

165 170 175 

His Ser He Ser He Pro Tyr Asn Asp Leu Glu Gly Ser Leu Asp Val 

180 185 190 

Leu Gly Arg Ala Gly Asp Asp Leu Ala Cys Val He He Glu Pro Leu 

195 200 205 

Leu Gly Gly Gly Gly Cys Xle Pro Ala Asp Glu Asp Tyr Leu Arg Gly 

210 215 220 

He Gin Glu Phe Val His Ser Arg Gly Ala Leu Leu Val Leu Asp Glu 
225 230 235 240 

He Val Thr Gly Phe Arg Phe Arg Phe Gly Cys Ala Tyr Ala Ala Ala 

245 250 255 

Gly Leu Asp Pro Asp He Val Ala Leu Gly Lys He Val Gly Gly Gly 

260 265 270 

Phe Pro He Gly Val He Cys Gly Lys Asp Glu Val Met Glu He Ser 

275 280 285 

Asn Thr He Ser His Ala Lys Ser Asp Arg Ala Tyr He Gly Gly Gly 

290 295 300 

Thr Phe Ser Ala Asn Pro Ala Thr Met Thr Ala Gly Ala Ala Ala Leu 
305 310 315 320 

Gly Glu Leu Lys Lys Arg Lys Gly Thr He Tyr Pro Arg He Asn Ser 

325 330 335 

Met Gly Asp Asp Ala Arg Asp Lys Leu Ser Lys He Phe Gly Asn Arg 

340 345 350 

Val Ser Val Thr Gly Arg Gly Ser Leu Phe Met Thr His Phe Val Gin 

355 360 365 

Asp Gly Ala Gly Arg Val Ser Asn Ala Ala Asp Ala Ala Ala Cys Asp 

370 375 380 

Val Glu Leu Leu His Arg Tyr His Leu Asp Met He Thr Arg Asp Gly 
385 390 395 400 

He Phe Phe Leu Pro Gly Lys Leu Gly Ala He Scr Ala Ala His Ser 

405 410 415 

Lys Ala Asp Leu Lys Thr Met Tyr Ser Ala Ser Glu Arg Phe Ala Glu 
420 425 430 

Gly Leu 



<210> 47 
<211> 807 
<212> DNA 
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<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (807) 

<400> 47 

atg ata etc ttc ggc aag age gac ccc gec gag ctg gtg cgc cag gcg 48 
Met lie Leu Phe Gly Lys Ser Asp Pro Ala Glu Leu Val Arg Gin Ala 
15 10 15 

gac etc ctg tgc age aag aac cag ttc agg gcg gca ata ggc ctg tac 96 
Asp Leu Leu Cys Ser Lys Asn Gin Phe Arg Ala Ala lie Gly Leu Tyr 
20 25 30 

ggg aaa ate etc aag gac gac ccg cag aac agg ggc gtc ctg cac aaa 144 
Gly Lys lie Leu Lys Asp Asp Pro Gin Asn Arg Gly Val Leu His Lys 
35 40 45 

aag ggg ctg gec cag aac agg gca aaa aag tac tct gat gcg ate acg 192 
Lys Gly Leu Ala Gin Asn Arg Ala Lys Lys Tyr Ser Asp Ala He Thr 
50 55 60 

tgc ttt gac egg ctg etc gag ctt gac aac aag gac gcg ccc gcg tac 24 0 

Cys Phe Asp Arg Leu Leu Glu Leu Asp Asn Lys Asp Ala Pro Ala Tyr 
65 70 75 80 

aac aac aag gec ata gee cag gec gag etc gga gac acg gca tec gcg 288 
Asn Asn Lys Ala He Ala Gin Ala Glu Leu Gly Asp Thr Ala Ser Ala 
85 90 95 

ctg gaa aac tac ggc agg gec ate gag gee gac ccg egg tac gcg ccg 3 36 

Leu Glu Asn Tyr Gly Arg Ala He Glu Ala Asp Pro Arg Tyr Ala Pro 
100 105 110 

gcg cgc ttc aac agg gec gtg ctg etc gac agg ctg ggc gag cat gag 3 84 

Ala Arg Phe Asn Arg Ala Val Leu Leu Asp Arg Leu Gly Glu His Glu 
115 120 125 

gag gcg ctg ccg gac etc gac agg gca gec gag ctg gac cga cgc aag 432 
Glu Ala Leu Pro Asp Leu Asp Arg Ala Ala Glu Leu Asp Arg Arg Lys 
130 135 140 

ccg aac ccg agg ttc tac aag ggg ata gtg etc ggc aag atg ggc agg 480 
Pro Asn Pro Arg Phe Tyr Lys Gly He Val Leu Gly Lys Met Gly Arg 
145 150 155 160 

cac gaa gag gcg ctg gee tgc ttc aag ggc gtg tgc aag agg cat ccc 528 
His Glu Glu Ala Leu Ala Cys Phe Lys Gly Val Cys Lys Arg His Pro 
165 170 175 

ggc cac gec gac tea cag ttc cac gtg ggg ata gag ctt ace gag ctt 576 
Gly His Ala Asp Ser Gin Phe His Val Gly He Glu Leu Thr Glu Leu 
180 185 190 



ggc agg cac gee gag gee etc ggg 
Gly Arg His Ala Glu Ala Leu Gly 
195 200 



gag ctt gca tea ctg ccc gcg gag 
Glu Leu Ala Ser Leu Pro Ala Glu 
205 



624 
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cac cgc gag aac gcc aat gta ttg tat gcc agg gcg cgc age etc teg 672 
His Arg Glu Asn Ala Asn Val Leu Tyr Ala Arg Ala Arg Ser Leu Ser 
210 215 220 

ggc ctt ggc agg gag gac gaa tec ata gcg cac ctg caa aag gcg gcc 720 
Gly Leu Gly Arg Glu Asp Glu Ser He Ala His Leu Gin Lys Ala Ala 
225 230 235 240 

aaa aaa gat tec aag acg ata aaa aag tgg gcc cgc gca gaa aag gcc 768 
Lys Lys Asp Ser Lys Thr He Lys Lys Trp Ala Arg Ala Glu Lys Ala 
245 250 255 

ttt gac gga ata egg gac gat ccc ggt tea aaa aga tag 807 
Phe Asp Gly He Arg Asp Asp Pro Gly Ser Lys Arg * 
260 265 

<210> 48 
<211> 268 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 48 

Met He Leu Phe Gly Lys Ser Asp Pro Ala Glu Leu Val Arg Gin Ala 

15 10 is 

Asp Leu Leu Cys Ser Lys Asn Gin Phe Arg Ala Ala He Gly Leu Tyr 

20 25 30 

Gly Lys He Leu Lys Asp Asp Pro Gin Asn Arg Gly Val Leu His Lys 

35 40 45 

Lys Gly Leu Ala Gin Asn Arg Ala Lys Lys Tyr Ser Asp Ala He Thr 

50 55 60 

Cys Phe Asp Arg Leu Leu Glu Leu Asp Asn Lys Asp Ala Pro Ala Tyr 
65 70 75 80 

Asn Asn Lys Ala He Ala Gin Ala Glu Leu Gly Asp Thr Ala Ser Ala 

85 90 95 

Leu Glu Asn Tyr Gly Arg Ala He Glu Ala Asp Pro Arg Tyr Ala Pro 

100 105 no 

Ala Arg Phe Asn Arg Ala Val Leu Leu Asp Arg Leu Gly Glu His Glu 

115 120 125 

Glu Ala Leu Pro Asp Leu Asp Arg Ala Ala Glu Leu Asp Arg Arg Lys 

130 135 140 

Pro Asn Pro Arg Phe Tyr Lys Gly He Val Leu Gly Lys Met Gly Arg 
145 150 155 160 

His Glu Glu Ala Leu Ala Cys Phe Lys Gly Val Cys Lys Arg His Pro 

165 170 175 

Gly His Ala Asp Ser Gin Phe His Val Gly He Glu Leu Thr Glu Leu 

180 185 190 

Gly Arg His Ala Glu Ala Leu Gly Glu Leu Ala Ser Leu Pro Ala Glu 

195 200 205 

His Arg Glu Asn Ala Asn Val Leu Tyr Ala Arg Ala Arg Ser Leu Ser 

210 215 220 

Gly Leu Gly Arg Glu Asp Glu Ser He Ala His Leu Gin Lys Ala Ala 
225 230 235 240 

Lys Lys Asp Ser Lys Thr lie Lys Lys Trp Ala Arg Ala Glu Lys Ala 

245 250 255 

Phe Asp Gly He Arg Asp Asp Pro Gly Ser Lys Arg 
260 265 
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<210> 49 
<211> 708 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (708) 

<400> 49 

gtg egg cag ggg atg act gga aag acc agg acg gcg gtc ctg egg aac 4 8 

Met Arg Gin Gly Met Thr Gly Lys Thr Arg Thr Ala Val Leu Arg Asn 
15 10 15 

gec atg act gag gag teg get egg gee atg ata gag gca aag aag acg 96 
Ala Met Thr Glu Glu Ser Ala Arg Ala Met lie Glu Ala Lys Lys Thr 
20 25 30 

ggt gec ttt agg gee ctt atg agg gee ccg egg aaa gaa gac gtc cat 144 
Gly Ala Phe Arg Ala Leu Met Arg Ala Pro Arg Lys Glu Asp Val His 
35 40 45 

gtg cat tct gta aag ctg gtc cac gag gcg ctg ate egg gtc tec gee 192 
Val His Ser Val Lys Leu Val His Glu Ala Leu lie Arg Val Ser Ala 
50 55 60 

agg tac tct gcg gat ttt ttc aga aag gcg gtt cac ccg ate aag gtg 240 
Arg Tyr Ser Ala Asp Phe Phe Arg Lys Ala Val His Pro lie Lys Val 
65 70 75 80 

gac cag aac gtg ate gag gtg gtg eta gge gac gge gtc ttt cce ata 288 
Asp Gin Asn Val lie Glu Val Val Leu Gly Asp Gly Val Phe Pro lie 
85 90 95 

agg tec aag teg ege ata cac aag acg etc teg gca ggg etc gge aag 336 
Arg Ser Lys Ser Arg lie His Lys Thr Leu Ser Ala Gly Leu Gly Lys 
100 105 no 

aac agg gtc gac etc gag eta gaa gag cat gtc ttt gcg gaa tea gaa 384 
Asn Arg Val Asp Leu Glu Leu Glu Glu His Val Phe Ala Glu Ser Glu 
115 120 125 

999 atg atg tgc ctt gac egg cac gge gge gag acg gac ttt cce tac 432 
Gly Met Met Cys Leu Asp Arg His Gly Gly Glu Thr Asp Phe Pro Tyr 
130 135 140 

aag acg ggg cce gge gcg gtg gag ccg tac ccg egg agg ata etc gat 4 80 

Lys Thr Gly Pro Gly Ala Val Glu Pro Tyr Pro Arg Arg lie Leu Asp 
145 150 155 160 

gcg tea gag aat gtg egg age cce gag gtg gag aca gaa gag gcg etc 528 
Ala Ser Glu Asn Val Arg Ser Pro Glu Val Glu Thr Glu Glu Ala Leu 
165 170 175 



tea aaa eta aaa gag aag ctg cgc ggg cce ccg cct gac gge atg cgc 
Ser Lys Leu Lys Glu Lys Leu Arg Gly Pro Pro Pro Asp Gly Met Arg 
180 185 190 



576 
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gac ctg egg gag gag ttt gee gca aag gcg gtg gag gtg gtc tat gta 624 
Asp Leu Arg Glu Glu Phe Ala Ala Lys Ala Val Glu Val Val Tyr Val 
195 200 205 

cca gtc tat gaa teg cga ctt gtg ggg ccc aaa aaa aag gtc cgc atg 672 
Pro Val Tyr Glu Ser Arg Leu Val Gly Pro Lys Lys Lys Val Arg Met 
210 215 220 

atg egg att gac gcg gca aga aaa aag ate etc tag 708 
Met Arg lie Asp Ala Ala Arg Lys Lys lie Leu * 
225 230 235 

<210> 50 
<211> 235 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 50 

Met Arg Gin Gly Met Thr Gly Lys Thr Arg Thr Ala Val Leu Arg Asn 

15 10 15 

Ala Met Thr Glu Glu Ser Ala Arg Ala Met lie Glu Ala Lys Lys Thr 

20 25 30 

Gly Ala Phe Arg Ala Leu Met Arg Ala Pro Arg Lys Glu Asp Val His 

35 40 45 

Val His Ser Val Lys Leu Val His Glu Ala Leu lie Arg Val Ser Ala 

50 55 60 

Arg Tyr Ser Ala Asp Phe Phe Arg Lys Ala Val His Pro He Lys Val 
65 70 75 80 

Asp Gin Asn Val He Glu Val Val Leu Gly Asp Gly Val Phe Pro He 

85 90 95 

Arg Ser Lys Ser Arg He His Lys Thr Leu Ser Ala Gly Leu Gly Lys 

100 105 110 

Asn Arg Val Asp Leu Glu Leu Glu Glu His Val Phe Ala Glu Ser Glu 

115 120 125 

Gly Met Met Cys Leu Asp Arg His Gly Gly Glu Thr Asp Phe Pro Tyr 

130 135 140 

Lys Thr Gly Pro Gly Ala Val Glu Pro Tyr Pro Arg Arg He Leu Asp 
145 150 155 160 

Ala Ser Glu Asn Val Arg Ser Pro Glu Val Glu Thr Glu Glu Ala Leu 

165 170 175 

Ser Lys Leu Lys Glu Lys Leu Arg Gly Pro Pro Pro Asp Gly Met Arg 

180 185 190 

Asp Leu Arg Glu Glu Phe Ala Ala Lys Ala Val Glu Val Val Tyr Val 

195 200 205 

Pro Val Tyr Glu Ser Arg Leu Val Gly Pro Lys Lys Lys Val Arg Met 

210 215 220 

Met Arg He Asp Ala Ala Arg Lys Lys He Leu 
225 230 235 

<210> 51 
<211> 378 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (378) 
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<400> 51 

atg agg teg gag ggc agg ccc gga tac ate gaa aag ttc eta aag agg 48 
Met Arg Ser Glu Qly Arg Pro Gly Tyr lie Glu Lys Phe Leu Lys Arg 
15 10 15 

gcg gac aag gcg ata gac aat gca gtc gag cag ggc gtc aag agg gca 96 
Ala Asp Lys Ala lie Asp Asn Ala Val Glu Gin Gly Val Lys Arg Ala 
20 25 30 

gac gag ata eta gat gac gca gtc gag etc ggc aag ate ace gtg ggc 144 
Asp Glu He Leu Asp Asp Ala Val Glu Leu Gly Lys He Thr Val Gly 
35 40 45 

gag gcg caa aaa aga age gat gtg ctg etc aag cag gec gag egg gag 192 
Glu Ala Gin Lys Arg Ser Asp Val Leu Leu Lys Gin Ala Glu Arg Glu 
50 55 60 

age aag egg etc aag tea agg ggc gec aaa aag etc gaa aag ggc ata 240 
Ser Lys Arg Leu Lys Ser Arg Gly Ala Lys Lys Leu Glu Lys Gly He 
65 70 75 80 

ggg gcg gca aaa aag atg gca gee ggc aag ggc gac gcg eta gag acc 288 
Gly Ala Ala Lys Lys Met Ala Ala Gly Lys Gly Asp Ala Leu Glu Thr 
85 90 95 

ctg gca aag etc ggc gag ctg aga aag gcg ggg ate ata acg gag aag 336 
Leu Ala Lys Leu Gly Glu Leu Arg Lys Ala Gly lie He Thr Glu Lys 
100 105 110 

gag ttt cgc gec aag aaa aag aag ctt etc gcg gag ate tga 378 
Glu Phe Arg Ala Lys Lys Lys Lys Leu Leu Ala Glu He * 
115 120 125 

<210> 52 
<211> 125 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 52 



Met Arg 


Ser 


Glu 


Gly 


Arg Pro 


Gly 


Tyr He Glu Lys Phe 


Leu 


Lys 


Arg 


1 








5 






10 




15 




Ala 


Asp 


Lys 


Ala 


He 


Asp Asn 


Ala 


Val Glu Gin Gly Val 


Lys 


Arg 


Ala 








20 








25 


30 






Asp 


Glu 


He 


Leu 


Asp 


Asp Ala 


Val 


Glu Leu Gly Lys He 


Thr 


Val 


Gly 






35 








40 


45 








Glu 


Ala 


Gin 


Lys 


Arg 


Ser Asp 


Val 


Leu Leu Lys Gin Ala Glu Arg Glu 




50 








55 




60 








Ser 


Lys 


Arg 


Leu 


Lys 


Ser Arg 


Gly 


Ala Lys Lys Leu Glu Lys Gly 


He 


65 










70 




75 






80 


Gly Ala 


Ala 


Lys 


Lys 


Met Ala 


Ala 


Gly Lys Gly Asp Ala 


Leu 


Glu 


Thr 










85 






90 




95 




Leu 


Ala 


Lys 


Leu 


Gly 


Glu Leu 


Arg 


Lys Ala Gly lie He 


Thr 


Glu 


Lys 








100 








105 


110 






Glu 


Phe 


Arg 


Ala 


Lys 


Lys Lys 


Lys 


Leu Leu Ala Glu He 












115 








120 


125 









<210> 53 
<211> 606 
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<212> DNA 

<213> Cenarchaeum symbiosura 

<220> 

<221> CDS 

<222> (1) . . . (606) 

<400> 53 

atg tec aag acg gag gec tec ccg ggg gga tat gec tgc acg cca tac 48 

Met Ser Lys Thr Glu Ala Ser Pro Gly Gly Tyr Ala Cys Thr Pro Tyr 
15 10 15 



acg cac gac cat gec teg ata gag etc aag gag gaa tgg tec teg teg 96 
Thr His Asp His Ala Ser lie Glu Leu Lys Glu Glu Trp Ser Ser Ser 
20 25 30 

agg aac gta ggc gag atg tac ttt gtg ace gee act ttc teg tec aaa 144 
Arg Asn Val Gly Glu Met Tyr Phe Val Thr Ala Thr Phe Ser Ser Lys 
35 40 45 

age aag ccg tac ttt gag cag cag gec age cac tac ctg ctg gca agg 192 
Ser Lys Pro Tyr Phe Glu Gin Gin Ala Ser His Tyr Leu Leu Ala Arg 
50 55 60 

ttc aaa aac ggc ccc aaa atg ata aag gcg gtg gag ggc cgc ggg ggc 24 0 

Phe Lys Asn Gly Pro Lys Met lie Lys Ala Val Glu Gly Arg Gly Gly 
65 70 75 80 

ggc cct tec tat tta ttc age atg gac gag gag ata ttc gaa agg gaa 288 
Gly Pro Ser Tyr Leu Phe Ser Met Asp Glu Glu lie Phe Glu Arg Glu 
85 90 95 

tec ccc ggg atg age tat gta tec atg tac tat ctg gaa tac gga gat 336 
Ser Pro Gly Met Ser Tyr Val Ser Met Tyr Tyr Leu Glu Tyr Gly Asp 
100 105 110 

tec gag gag gac ata cgc gag gtg gcg teg gta gtg gca aga aag gag 384 
Ser Glu Glu Asp He Arg Glu Val Ala Ser Val Val Ala Arg Lys Glu 
115 120 125 

aag ata ggc agg gcg gga ata ggg cgc atg gat gta tgc teg agg att 432 
Lys He Gly Arg Ala Gly He Gly Arg Met Asp Val Cys Ser Arg He 
130 135 140 

ccg cca aag ttt gec ttc ccg tac age ggg aac att gtg gtg etc gag 480 
Pro Pro Lys Phe Ala Phe Pro Tyr Ser Gly Asn He Val Val Leu Glu 
145 150 155 160 

gta tec age gaa aag age cac cag age gtc aac aag tac tgc gaa aag 528 
Val Ser Ser Glu Lys Ser His Gin Ser Val Asn Lys Tyr Cys Glu Lys 
165 170 175 

act aga agg gaa gtg ate cgc aag ggg ata acg atg acc aac ctt gta 576 
Thr Arg Arg Glu Val He Arg Lys Gly He Thr Met Thr Asn Leu Val 
180 185 190 



age ctg teg ata ctg gag agg etc aaa taa 
Ser Leu Ser He Leu Glu Arg Leu Lys * 
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195 200 

<210> 54 
<211> 201 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 54 

Met Ser Lys Thr Glu Ala Ser Pro Gly Gly Tyr Ala Cys Thr Pro Tyr 

15 10 15 

Thr His Asp His Ala Ser lie Glu Leu Lys Glu Glu Trp Ser Ser Ser 

20 25 30 

Arg Asn Val Gly Glu Met Tyr Phe Val Thr Ala Thr Phe Ser Ser Lys 

35 40 45 

Ser Lys Pro Tyr Phe Glu Gin Gin Ala Ser His Tyr Leu Leu Ala Arg 

50 55 60 

Phe Lys Asn Gly Pro Lys Met lie Lys Ala Val Glu Gly Arg Gly Gly 
65 70 75 80 

Gly Pro Ser Tyr Leu Phe Ser Met Asp Glu Glu lie Phe Glu Arg Glu 

85 90 95 

Ser Pro Gly Met Ser Tyr Val Ser Met Tyr Tyr Leu Glu Tyr Gly Asp 

100 105 110 

Ser Glu Glu Asp lie Arg Glu Val Ala Ser Val Val Ala Arg Lys Glu 

115 120 125 

Lys lie Gly Arg Ala Gly lie Gly Arg Met Asp Val Cys Ser Arg lie 

130 135 140 

Pro Pro Lys Phe Ala Phe Pro Tyr Ser Gly Asn lie Val Val Leu Glu 
145 150 155 160 

Val Ser Ser Glu Lys Ser His Gin Ser Val Asn Lys Tyr Cys Glu Lys 

165 170 175 

Thr Arg Arg Glu Val lie Arg Lys Gly lie Thr Met Thr Asn Leu Val 

180 185 190 

Ser Leu Ser lie Leu Glu Arg Leu Lys 
195 200 

<210> 55 
<211> 822 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (822) 

<400> 55 

ttg aaa agt acg ttg gtt egg cgc tac aag ccc aag ata aag cag acc 4 8 

Met Lys Ser Thr Leu Val Arg Arg Tyr Lys Pro Lys lie Lys Gin Thr 
15 10 15 

etc cgc gag gtg ccc etc aaa aat gtg cat gtg tgg aag gag gcg cag 96 
Leu Arg Glu Val Pro Leu Lys Asn Val His Val Trp Lys Glu Ala Gin 
20 25 30 

gca agg agg ctg gac agg tec egg gtg egg gat ate gca aag teg ate 144 
Ala Arg Arg Leu Asp Arg Ser Arg Val Arg Asp lie Ala Lys Ser lie 
35 40 45 



aga tea gag ggg ctg cag aac ccg ccc gtc ata cag agg ggc ggc agg 



192 
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Arg Ser Glu Gly Leu Gin Asn Pro Pro Val lie Gin Arg Gly Gly Arg 
50 55 60 

ggg ctg tac etc etc ata teg ggg cac cac egg ctt gcg gec etc aag 24 0 

Gly Leu Tyr Leu Leu lie Ser Gly His His Arg Leu Ala Ala Leu Lys 
65 70 75 60 

tac ctg ggc gca aaa aag tec aag ttt ctg gtg ata ace aag gat aca 288 
Tyr Leu Gly Ala Lye Lys Ser Lys Phe Leu Val lie Thr Lys Asp Thr 
85 90 95 

gag tac ggc ctg gat gat gca aag gee gca teg gtt gta gag aac ctg 336 
Glu Tyr Gly Leu Asp Asp Ala Lys Ala Ala Ser Val Val Glu Asn Leu 
100 105 110 

cac cgt etc cag atg age ccg egg gag ctt gca gac gca tgc aag ttc 384 
His Arg Leu Gin Met Ser Pro Arg Glu Leu Ala Asp Ala Cys Lys Phe 
115 120 125 

ctg gec gag cag acg aca aaa tec gag gec gca aaa aag etc ggc atg 432 
Leu Ala Glu Gin Thr Thr Lys Ser Glu Ala Ala Lys Lys Leu Gly Met 
130 135 140 

teg atg ccc acg ttc aag aaa tac cac ggc ttt gcg ggc gta ccg gac 48 0 

Ser Met Pro Thr Phe Lys Lys Tyr His Gly Phe Ala Gly Val Pro Asp 
145 150 155 160 

aag ate aag gcg atg gta ccg ggc ace ata tec egg gac gag gcg aca 52 8 

Lys lie Lys Ala Met Val Pro Gly Thr lie Ser Arg Asp Glu Ala thr 
165 170 175 

agg etc tac cag gcg gtg ccg acc ata tec cag gcg etc aag gtg gta 576 
Arg Leu Tyr Gin Ala Val Pro Thr lie Ser Gin Ala Leu Lys Val Val 
160 185 190 

tea aag ata gca aag etc gac agg ccg teg agg egg ate tac ctg agg 624 
Ser Lys lie Ala Lys Leu Asp Arg Pro Ser Arg Arg lie Tyr Leu Arg 
195 200 205 

ttg ctt gec cag age ccc cgc tec ggc cac aag ata ata eta aag agg 672 
Leu Leu Ala Gin Ser Pro Arg Ser Gly His Lys lie lie Leu Lys Arg 
210 215 220 

atg cgc aag gtg ggc ate aag aaa aag ata cca ata gag ctg ggc aag 720 
Met Arg Lys Val Gly lie Lys Lys Lys lie Pro He Glu Leu Gly Lys 
225 230 235 240 

aac ggc gca aga aag etc tec agg ctg gee gag cgc gag ggg aca gac 768 
Asn Gly Ala Arg Lys Leu Ser Arg Leu Ala Glu Arg Glu Gly Thr Asp 
245 250 255 

gag acc egg ctt gee aac agg ata gtc egg gaa tac ctg agg aag egg 816 
Glu Thr Arg Leu Ala Asn Arg He Val Arg Glu Tyr Leu Arg Lys Arg 
260 265 270 



cga tga 
Arg * 



822 
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<210> 56 

<211> 273 

<212> PRT 

<213> Cenarchaeum 



symbiosum 



<400> 56 

Met Lys Ser Thr Leu Val Arg Arg Tyr Lys Pro Lys lie Lys Gin Thr 

15 10 15 

Leu Arg Glu Val Pro Leu Lys Asn Val His Val Trp Lys Glu Ala Gin 

20 25 30 

Ala Arg Arg Leu Asp Arg Ser Arg Val Arg Asp lie Ala Lys Ser lie 

35 40 45 

Arg Ser Glu Gly Leu Gin Asn Pro Pro Val He Gin Arg Gly Gly Arg 

50 55 60 

Gly Leu Tyr Leu Leu He Ser Gly His His Arg Leu Ala Ala Leu Lys 
65 70 75 80 

Tyr Leu Gly Ala Lys Lys Ser Lys Phe Leu Val He Thr Lys Asp Thr 

85 90 95 

Glu Tyr Gly Leu Asp Asp Ala Lys Ala Ala Ser Val Val Glu Asn Leu 

100 105 110 

His Arg Leu Gin Met Ser Pro Arg Glu Leu Ala Asp Ala Cys Lys Phe 

115 120 125 

Leu Ala Glu Gin Thr Thr Lys Ser Glu Ala Ala Lys Lys Leu Gly Met 

130 135 140 

Ser Met Pro Thr Phe Lys Lys Tyr His Gly Phe Ala Gly Val Pro Asp 
145 150 155 160 

Lys He Lys Ala Met Val Pro Gly Thr He Ser Arg Asp Glu Ala Thr 

165 170 175 

Arg Leu Tyr Gin Ala Val Pro Thr He Ser Gin Ala Leu Lys Val Val 

180 185 190 

Ser Lys He Ala Lys Leu Asp Arg Pro Ser Arg Arg He Tyr Leu Arg 

195 200 205 

Leu Leu Ala Gin Ser Pro Arg Ser Gly His Lys He He Leu Lys Arg 

210 215 220 

Met Arg Lys Val Gly He Lys Lys Lys He Pro He Glu Leu Gly Lys 
225 230 235 240 

Asn Gly Ala Arg Lys Leu Ser Arg Leu Ala Glu Arg Glu Gly Thr Asp 

245 250 255 

Glu Thr Arg Leu Ala Asn Arg He Val Arg Glu Tyr Leu Arg Lys Arg 
260 265 270 

Arg 



<210> 57 
<211> 669 
<212> DNA 

<213> Cenarchaeum symbiosum 



<220> 

<221> CDS 

<222> (1) . . . (669) 



<400> 57 

gtg gcg cga teg ccc gtg ctg ata ata aac tgc aaa aac tac aag gag 48 

Met Ala Arg Ser Pro Val Leu He He Asn Cys Lys Asn Tyr Lys Glu 

1 5 10 15 



gcg gec ggc ggc aga att gac age eta gcg gcg gca gec gec ggg gcg 
Ala Ala Gly Gly Arg He Asp Ser Leu Ala Ala Ala Ala Ala Gly Ala 



96 
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20 25 30 

gcc gca aaa tac ggc gtc agg ata get ctt gec ccg ccg cag cac ctg 144 
Ala Ala Lys Tyr Gly Val Arg lie Ala Leu Ala Pro Pro Gin His Leu 
35 40 45 

ctg ggc gca gta aag ggg gaa gat ctt aca gtt ctg gcg cag cat ata 192 
Leu Gly Ala Val Lys Gly Glu Asp Leu Thr Val Leu Ala Gin His He 
50 55 60 

gac gac aag ggg gtt gga age acc aca gga tat gtc gtg ccg gag ctg 240 
Asp Asp Lys Gly Val Gly Ser Thr Thr Gly Tyr Val Val Pro Glu Leu 
65 70 75 80 

ctg gga gaa tec ggc gtc tct ggc gcg etc ate aac cac age gag cac 288 
Leu Gly Glu Ser Gly Val Ser Gly Ala Leu He Asn His Ser Glu His 
85 90 95 

cgc gta tea get gac cag gtg gca age ctt gtg ccc agg etc agg ggt 336 
Arg Val Ser Ala Asp Gin Val Ala Ser Leu Val Pro Arg Leu Arg Gly 
100 105 110 

ctg gat atg ate tec gtg gtc tgt gta aag gat tec gcc gag gcg gca 384 
Leu Asp Met He Ser Val Val Cys Val Lys Asp Ser Ala Glu Ala Ala 
115 120 125 

aat etc tec egg cac egg ccc gac tac ata get ate gag cct ccc gag 432 
Asn Leu Ser Arg His Arg Pro Asp Tyr lie Ala He Glu Pro Pro Glu 
130 135 140 

ctg ata ggc teg ggc agg tec gtc tea teg gag agg ccc gag ctg ata 4 80 

Leu He Gly Ser Gly Arg Ser Val Ser Ser Glu Arg Pro Glu Leu lie 
145 150 155 160 

959 gag gca gca gag gcc ate agg ggg gcg gat gga aca aag ctg etc 528 
Gly Glu Ala Ala Glu Ala He Arg Gly Ala Asp Gly Thr Lys Leu Leu 
165 170 175 

tgc ggg gcg ggc ata aca tea ggc get gat gtg cgc aag gcc etc gag 576 
Cys Gly Ala Gly He Thr Ser Gly Ala Asp Val Arg Lys Ala Leu Glu 
180 185 190 

etc ggc tec aag ggg ate etc gtg gca age ggg gtg gta aaa tea tea 624 
Leu Gly Ser Lys Gly He Leu Val Ala Ser Gly Val Val Lys Ser Ser 
195 200 205 

gac ccc get gcg gcc ata gcc gag ctg gca cag gcc atg tec tga 669 
Asp Pro Ala Ala Ala He Ala Glu Leu Ala Gin Ala Met Ser * 
210 215 220 

<210> 58 
<211> 222 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 58 

Met Ala Arg Ser Pro Val Leu lie He Asn Cys Lys Asn Tyr Lys Glu 
15 10 15 
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Ala Ala Gly Gly Arg lie Asp Ser Leu Ala Ala Ala Ala Ala Gly Ala 

20 25 30 

Ala Ala Lys Tyr Gly Val Arg lie Ala Leu Ala Pro Pro Gin His Leu 

35 40 45 

Leu Gly Ala Val Lys Gly Glu Asp Leu Thr Val Leu Ala Gin His lie 

50 55 60 

Asp Asp Lys Gly Val Gly Ser Thr Thr Gly Tyr Val Val Pro Glu Leu 
65 70 75 80 

Leu Gly Glu Ser Gly Val Ser Gly Ala Leu lie Asn His Ser Glu His 

85 90 95 

Arg Val Ser Ala Asp Gin Val Ala Ser Leu Val Pro Arg Leu Arg Gly 

100 105 110 

Leu Asp Met He Ser Val Val Cys Val Lys Asp Ser Ala Glu Ala Ala 

115 120 125 

Asn Leu Ser Arg His Arg Pro Asp Tyr He Ala He Glu Pro Pro Glu 

130 135 140 

Leu He Gly Ser Gly Arg Ser Val Ser Ser Glu Arg Pro Glu Leu He 
145 150 155 160 

Gly Glu Ala Ala Glu Ala He Arg Gly Ala Asp Gly Thr Lys Leu Leu 

165 170 175 

Cys Gly Ala Gly He Thr Ser Gly Ala Asp Val Arg Lys Ala Leu Glu 

180 185 190 

Leu Gly Ser Lys Gly He Leu Val Ala Ser Gly Val Val Lys Ser Ser 

195 200 205 

Asp Pro Ala Ala Ala He Ala Glu Leu Ala Gin Ala Met Ser 
210 215 220 

<210> 59 
<211> 549 
<212> DNA 

<213> Cenarchaeum symbiosum - 

<220> 

<221> CDS 

<222> (1) . . . (548) 

<400> 59 

atg ctg gat ccc egg acg egg ccc egg gtc gtc aat gtc gtc age aca 48 
Met Leu Asp Pro Arg Thr Arg Pro Arg Val Val Asn Val Val Ser Thr 
15 10 15 

tea gac ctt gta caa agg gtg age gca aaa aag atg gee gee atg ccg 96 
Ser Asp Leu Val Gin Arg Val Ser Ala Lys Lys Met Ala Ala Met Pro 



20 



25 



30 



tgc tgc atg tat gat gag gee gta tac ggc ggc agg tgc ggc tac ata 
Cys Cys Met Tyr Asp Glu Ala Val Tyr Gly Gly Arg Cys Gly Tyr He 
35 40 45 



144 



aag acg ccc ggc atg cag ggg agg gtg act gta ttc att tct ggc aag 
Lys Thr Pro Gly Met Gin Gly Arg Val Thr Val Phe He Ser Gly Lys 
50 55 60 



192 



atg ata tec gtc ggc gec aga tec gtg agg gec teg ttt ggg cag ctg 
Met He Ser Val Gly Ala Arg Ser Val Arg Ala Ser Phe Gly Gin Leu 
65 70 75 80 



240 



cac gag gcg egg etc cac ctg gtg cgc aac ggg get gec ggc gac tgc 



288 
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His Glu Ala Arg Leu His Leu Val 
85 

aag ata agg ccc gtc gtg cgc aat 
Lys lie Arg Pro Val Val Arg Asn 
100 

agg aat gtt ccc ata gac agg ata 
Arg Asn Val Pro He Asp Arg He 
115 120 

tat gat ccc ggg teg ttt ccc ggg 
Tyr Asp Pro Gly Ser Phe Pro Gly 
130 135 

tgc age ttt eta gtc ttt gcg teg 
Cys Ser Phe Leu Val Phe Ala Ser 
145 150 

aag teg ccg gat gag ctg cgc agg 
Lys Ser Pro Asp Glu Leu Arg Arg 
165 

etc aat aac gcg ggg gec ta g 
Leu Asn Asn Ala Gly Ala 
180 

<210> 60 
<211> 182 
<212> PRT 
<213> Cenarchaeum symbiosum 

<400> 60 



Met 


Leu Asp Pro Arg 


Thr 


Arg 


Pro 


Arg Val 


Val Asn Val Val Ser 


Thr 


1 


5 








10 


15 




Ser 


Asp Leu Val Gin 


Arg 


Val 


Ser 


Ala Lys 


Lys Met Ala Ala Met 


Pro 




20 








25 


30 




Cys 


Cys Met Tyr Asp 


Glu 


Ala 


Val 


Tyr Gly Gly Arg Cys Gly Tyr 


He 


35 






40 




. 45 




Lys 


Thr Pro Gly Met 


Gin Gly 


Arg 


Val Thr 


Val Phe He Ser Gly 


Lys 


50 




55 






60 




Met 


He Ser Val Gly 


Ala 


Arg 


Ser 


Val Arg 


Ala Ser Phe Gly Gin 


Leu 


65 




70 








75 


80 


His 


Glu Ala Arg Leu 


His 


Leu 


Val 


Arg Asn Gly Ala Ala Gly Asp 


Cys 




85 








90 


95 




Lys 


He Arg Pro Val 


Val 


Arg 


Asn 


He Val 


Ala Thr Val Asp Ala 


Gly 


100 








105 


110 




Arg 


Asn Val Pro He 


Asp Arg 


He 


Ser Ser 


Arg Met Pro Gly Ala 


Val 


115 






120 




125 




Tyr Asp Pro Gly Ser 


Phe 


Pro 


Gly 


Met He 


Leu Lys Gly Leu Asp 


Ser 




130 




135 






140 




Cys 


Ser Phe Leu Val 


Phe 


Ala 


Ser 


Gly Lys 


Met Val He Ala Gly 


Ala 


145 




150 








155 


160 


Lys 


Ser Pro Asp Glu 


Leu 


Arg 


Arg 


Ser Ser 


Phe Asp Leu Leu Thr 


Arg 


165 








170 


175 




Leu Asn Asn Ala Gly 


Ala 













180 



Arg Asn Gly Ala Ala Gly Asp Cys 
90 95 

att gta gee acg gtg gat gec ggt 336 
He Val Ala Thr Val Asp Ala Gly 
105 HO 

teg teg cgc atg cct ggc get gta 384 
Ser Ser Arg Met Pro Gly Ala Val 
125 

atg ata etc aag ggg ctg gac age 432 
Met He Leu Lys Gly Leu Asp Ser 

140 



gga aag atg gtg ata gcg ggc gec 
Gly Lys Met Val He Ala Gly Ala 
155 160 

teg teg ttt gac ctg ctg acg cgc 
Ser Ser Phe Asp Leu Leu Thr Arg 
170 175 



480 



528 



549 
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<210> 61 
<211> 2538 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 
<221> CDS 

<222> (1) . . . (2538) 
<400> 61 

ctg act gca cag gat gaa gag att ccc ccg tea ctg ctt gta tct gca 48 
Met Thr Ala Gin Asp Glu Glu lie Pro Pro Ser Leu Leu Val Ser Ala 
15 10 15 

acc tat gat ggc cag gca agg gec gtg gtc etc aag ttc tac gag teg 96 
Thr Tyr Asp Gly Gin Ala Arg Ala Val Val Leu Lys Phe Tyr Glu Ser 
20 25 30 

gaa teg caa aag ate ate cac tgg acg gac aac acg ggg cac aag ccc 144 
Glu Ser Gin Lys lie lie His Trp Thr Asp Asn Thr Gly His Lys Pro 
35 40 45 

tac tgt tat acg agg ctg ccg ccc tec gag etc ggc ttt ctt ggg ggc 192 
Tyr Cys Tyr Thr Arg Leu Pro Pro Ser Glu Leu Gly Phe Leu Gly Gly 
50 55 60 

agg gag gac gtg etc ggg ata gag cag gtc atg egg cac gac ctg ata 24 0 

Arg Glu Asp Val Leu Gly lie Glu Gin Val Met Arg His Asp Leu lie 
65 70 75 80 

gec gac aag gag gtg ccc gtc tec aag ata acc gtc tct gat cct ctt 288 
Ala Asp Lys Glu Val Pro Val Ser Lys lie Thr Val Ser Asp Pro Leu 
85 90 95 

gcg ata ggc ggg acc cac teg gag aag age ate aga aac gtg ata gac 336 
Ala lie Gly Gly Thr His Ser Glu Lys Ser He Arg Asn Val He Asp 
100 105 110 

acg tgg gaa tec gac ata aag tat tac gag aac tat ctg tat gac gcg 384 
Thr Trp Glu Ser Asp He Lys Tyr Tyr Glu Asn Tyr Leu Tyr Asp Ala 
115 120 125 

ggc ctg gta gtg ggc agg tac tat teg gta tea ggc ggg gag gtg att 432 
Gly Leu Val Val Gly Arg Tyr Tyr Ser Val Ser Gly Gly Glu Val lie 
130 135 140 

ccg cat gac atg cca ata tec gac gag gta aaa ctg gee etc aag age 480 
Pro His Asp Met Pro He Ser Asp Glu Val Lys Leu Ala Leu Lys Ser 
145 150 155 160 

ctt etc tgg gac aag etc ata gac gag ggc atg gee gac agg aaa gag 528 
Leu Leu Trp Asp Lys Leu He Asp Glu Gly Met Ala Asp Arg Lys Glu 
165 170 175 



ttc cgc gag ttc ata gcg ggg tgg gcg gac ctg etc aac cag ccc ata 
Phe Arg Glu Phe He Ala Gly Trp Ala Asp Leu Leu Asn Gin Pro He 
1B0 185 190 
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ccc egg ata agg cgc etc age ttt gac ate gag gtg gat tea gag gag 624 
Pro Arg lie Axg Arg Leu Ser Phe Asp lie Glu Val Asp Ser Glu Glu 
195 200 205 

ggc agg ate ccc gat gec aag ate teg gac agg agg gtc aca gca gtg 672 
Gly Arg lie Pro Asp Ala Lys lie Ser Asp Arg Arg Val Thr Ala Val 
210 215 220 

ggg ttt gec gec ace gac ggc etc aga aag gtc ctt gtc ctg aag age 720 
Gly Phe Ala Ala Thr Asp Gly Leu Arg Lys Val Leu Val Leu Lys Ser 
225 230 235 240 

ggc gcg gac gag ggc gca aac gat gtg ace ccc ggg gtc gag gtg gtg 768 
Gly Ala Asp Glu Gly Ala Asn Asp Val Thr Pro Gly Val Glu Val Val 
245 250 255 

ttc tac gac gag gac aag gag gcg gac atg ate cgc gac gcg eta gca 816 
Phe Tyr Asp Glu Asp Lys Glu Ala Asp Met lie Arg Asp Ala Leu Ala 
260 265 270 

ata ata ggc teg tac ccg ttt gtg ctt aca tac aac ggg gac gac ttt 864 
lie He Gly Ser Tyr Pro Phe Val Leu Thr Tyr Asn Gly Asp Asp Phe 
275 280 285 

gac atg ccg tac atg tac aat egg gee egg cgc etc ggc gtg gcg gat 912 
Asp Met Pro Tyr Met Tyr Asn Arg Ala Arg Arg Leu Gly Val Ala Asp 
290 295 300 

tec gac ata ccc ctg tac atg atg egg gat teg gee acg etc egg cac 960 
Ser Asp He Pro Leu Tyr Met Met Arg Asp Ser Ala Thr Leu Arg His 
305 310 315 320 

ggc gtc cat ctg gac ctg tac agg ace ttc teg aac agg teg ttc cag 1008 
Gly Val His Leu Asp Leu Tyr Arg Thr Phe Ser Asn Arg Ser Phe Gin 
325 330 335 

ctg tat gca ttt gcg gca aag tat aca gat tac tec ctg aac age gtg 1056 
Leu Tyr Ala Phe Ala Ala Lys Tyr Thr Asp Tyr Ser Leu Asn Ser Val 
340 345 350 

tec aag gcg atg etc ggc gag ggc aag gtc gat tat ggc gtg tct etc 1104 
Ser Lys Ala Met Leu Gly Glu Gly Lys Val Asp Tyr Gly Val Ser Leu 
355 360 365 

ggg gat etc act eta tac cag act gca aac tat tgc tat cat gac gcg 1152 
Gly Asp Leu Thr Leu Tyr Gin Thr Ala Asn Tyr Cys Tyr His Asp Ala 
370 375 380 

cgc ctg acg ctg gag ctt age ace ttt ggg aac gag ata ctg atg gac 1200 
Arg Leu Thr Leu Glu Leu Ser Thr Phe Gly Asn Glu He Leu Met Asp 
385 390 395 400 

etc ctg gtg gtg ace age agg att gee egg atg ccc ate gat gat atg 1248 
Leu Leu Val Val Thr Ser Arg He Ala Arg Met Pro He Asp Asp Met 
405 410 415 



tec cgc atg ggc gtc teg cag tgg ata agg age ctg ctg tac tat gag 
Ser Arg Met Gly Val Ser Gin Trp He Arg Ser Leu Leu Tyr Tyr Glu 
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420 425 430 

cac agg cag cgc aac gcg ctg ata ccc cgc agg gac gag ctg gaa aag 1344 
His Arg Gin Arg Asn Ala Leu lie Pro Arg Arg Asp Glu Leu Glu Lys 
435 440 445 

agg tct caa cag gta age aac gac gec gta ate aag gac aaa aag ttc 1392 
Arg Ser Gin Gin Val Ser Asn Asp Ala Val lie Lys Asp Lys Lys Phe 
450 455 460 

cgc ggt ggt etc gta gtc gag cct gaa gag ggc ata cac ttt gat gtt 1440 
Arg Gly Gly Leu Val Val Glu Pro Glu Glu Gly lie His Phe Asp Val 
465 470 475 480 

aca gtt atg gat ttt gca age ctg tat cct age ata ata aag gtg cga 1488 
Thr Val Met Asp Phe Ala Ser Leu Tyr Pro Ser lie lie Lys Val Arg 
485 490 495 

aac etc teg tac gag acc gtc agg tgc gtt cat ccc gaa tgc aga aag 1536 
Asn Leu Ser Tyr Glu Thr Val Arg Cys Val His Pro Glu Cys Arg Lys 
500 505 510 

aac acc ate ccc gat acc aac cac tgg gta tgc acg aaa aac aac ggg 1584 
Asn Thr lie Pro Asp Thr Asn His Trp Val Cys Thr Lys Asn Asn Gly 
515 520 525 

ctt aca teg atg ata ata gga teg etc cgc gac ctg cgc gtc aac tat 1632 
Leu Thr Ser Met lie lie Gly Ser Leu Arg Asp Leu Arg Val Asn Tyr 
530 535 540 

tac aag age etc tea aag age cag tct ata acg gag gag cag egg cag 1680 
Tyr Lys Ser Leu Ser Lys Ser Gin Ser lie Thr Glu Glu Gin Arg Gin 
545 550 555 560 

cag tat act gtg ate age cag gee etc aag gtg gtg eta aac gca age 1728 
Gin Tyr Thr Val He Ser Gin Ala Leu Lys Val Val Leu Asn Ala Ser 
565 570 575 

tac ggg gtg atg ggc gee gag ata ttc ccg ctg tac ttt ctg cct gec 1776 
Tyr Gly Val Met Gly Ala Glu He Phe Pro Leu Tyr Phe Leu Pro Ala 
580 585 590 

gec gag gee acc acg gcg gtc ggg cgc tat ate ate atg cag acc ata 1824 
Ala Glu Ala Thr Thr Ala Val Gly Arg Tyr He He Met Gin Thr He 
595 600 605 

teg cac tgc gag cag atg ggc gta aag gtg ctg tac ggg gac acc gat 1872 
Ser His Cys Glu Gin Met Gly Val Lys Val Leu Tyr Gly Asp Thr Asp 
610 615 620 

teg ctg ttc ata aag aat cca gag gag egg cag ate cat gat ata gtc 1920 
Ser Leu Phe He Lys Asn Pro Glu Glu Arg Gin He His Asp He Val 
625 630 635 640 



gag cac gec aaa aag gag cac ggc gtc gag etc gag gtg gac aaa gag 
Glu His Ala Lys Lys Glu His Gly Val Glu Leu Glu Val Asp Lys Glu 
645 650 655 
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tac agg tat gtc gtg eta tct aac agg aag aaa aac tat ttc ggg gtg 2016 
Tyr Arg Tyr Val Val Leu Ser Asn Arg Lys Lys Asn Tyr Phe Gly Val 
660 665 670 

aca aag tec ggc aag gtc gac gtc aag ggc ctg acg ggg aaa aag teg 2064 
Thr Lys Ser Gly Lys Val Asp Val Lys Gly Leu Thr Gly Lys Lys Ser 
675 680 685 

cac acg ccc ccg ttc ata aag gag ctg ttc tat teg ctg etc gac ata 2112 
His Thr Pro Pro Phe lie Lys Glu Leu Phe Tyr Ser Leu Leu Asp lie 
690 695 700 

ctg teg get gta cag acc gag gac gag ttt gaa teg gca aag eta aag 2160 
Leu Ser Ala Val Gin Thr Glu Asp Glu Phe Glu Ser Ala Lys Leu Lys 
705 710 715 720 

ate tea aag gee ata gcg gca tec ggg aag agg ctg gag gag agg ggg 2208 
lie Ser Lys Ala lie Ala Ala Ser Gly Lys Arg Leu Glu Glu Arg Gly 
725 730 735 

gtc ccg ctg gcg gat ctg gcg ttc aat gtg atg ata age aag gcg ccc 2256 
Val Pro Leu Ala Asp Leu Ala Phe Asn Val Met lie Ser Lys Ala Pro 
740 745 750 

tct gaa tac gta aag acc gtc ccg cag cac ata egg gcg gec aga ctg 2304 
Ser Glu Tyr Val Lys Thr Val Pro Gin His lie Arg Ala Ala Arg Leu 
755 760 765 

etc gag aac gca agg gag gtc aaa aaa ggc gac ata ata teg tac gta 2352 
Leu Glu Asn Ala Arg Glu Val Lys Lys Gly Asp lie lie Ser Tyr Val 
770 775 780 

aag gtg atg aac aag aca ggc gtc aag cct gtc gag atg gee cag gca 2400 
Lys Val Met Asn Lys Thr Gly Val Lys Pro Val Glu Met Ala Gin Ala 
785 790 795 800 

gga gag gtg gac acg tea aag tat eta gag ttc atg gag tct act ctg 244 8 

Gly Glu Val Asp Thr Ser Lys Tyr Leu Glu Phe Met Glu Ser Thr Leu 
805 810 815 

gac cag etc acc teg tec atg ggc ctt gac ttt gac gag atg ctg ggc 24 96 

Asp Gin Leu Thr Ser Ser Met Gly Leu Asp Phe Asp Glu Met Leu Gly 
820 825 830 

aag cca aag cag act gga atg gag cag ttc ttt ttc aaa tga 2538 
Lys Pro Lys Gin Thr Gly Met Glu Gin Phe Phe Phe Lys * 
835 840 845 

<210> 62 
<211> 845 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 62 

Met Thr Ala Gin Asp Glu Glu lie Pro Pro Ser Leu Leu Val Ser Ala 

15 10 15 

Thr Tyr Asp Gly Gin Ala Arg Ala Val Val Leu Lys Phe Tyr Glu Ser 
20 25 30 
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Glu Ser Gin Lys lie lie His Trp Thr Asp Asn Thr Gly His Lys Pro 

35 40 45 

Tyr Cys Tyr Thr Arg Leu Pro Pro Ser Glu Leu Gly Phe Leu Gly Gly 

50 55 60 

Arg Glu Asp Val Leu Gly lie Glu Gin Val Met Arg His Asp Leu lie 
65 70 75 80 

Ala Asp Lys Glu Val Pro Val Ser Lys lie Thr Val Ser Asp Pro Leu 

85 90 95 

Ala lie Gly Gly Thr His Ser Glu Lys Ser lie Arg Asn Val He Asp 

100 105 110 

Thr Trp Glu Ser Asp He Lys Tyr Tyr Glu Asn Tyr Leu Tyr Asp Ala 

115 120 125 

Gly Leu Val Val Gly Arg Tyr Tyr Ser Val Ser Gly Gly Glu Val He 

130 135 140 

Pro His Asp Met Pro He Ser Asp Glu Val Lys Leu Ala Leu Lys Ser 
145 150 155 160 

Leu Leu Trp Asp Lys Leu lie Asp Glu Gly Met Ala Asp Arg Lys Glu 

165 170 175 

Phe Arg Glu Phe He Ala Gly Trp Ala Asp Leu Leu Asn Gin Pro He 

180 185 190 

Pro Arg He Arg Arg Leu Ser Phe Asp He Glu Val Asp Ser Glu Glu 

195 200 205 

Gly Arg He Pro Asp Ala Lys He Ser Asp Arg Arg Val Thr Ala Val 

210 215 220 

Gly Phe Ala Ala Thr Asp Gly Leu Arg Lys Val Leu Val Leu Lys Ser 
225 230 235 240 

Gly Ala Asp Glu Gly Ala Asn Asp Val Thr Pro Gly Val Glu Val Val 

245 250 255 

Phe Tyr Asp Glu Asp Lys Glu Ala Asp Met He Arg Asp Ala Leu Ala 

260 265 270 

He He Gly Ser Tyr Pro Phe Val Leu Thr Tyr Asn Gly Asp Asp Phe 

275 280 285 

Asp Met Pro Tyr Met Tyr Asn Arg Ala Arg Arg Leu Gly Val Ala Asp 

290 295 300 

Ser Asp He Pro Leu Tyr Met Met Arg Asp Ser Ala Thr Leu Arg His 
305 310 315 320 

Gly Val His Leu Asp Leu Tyr Arg Thr Phe Ser Asn Arg Ser Phe Gin 

325 330 335 

Leu Tyr Ala Phe Ala Ala Lys Tyr Thr Asp Tyr Ser Leu Asn Ser Val 

340 345 350 

Ser Lys Ala Met Leu Gly Glu Gly Lys Val Asp Tyr Gly Val Ser Leu 

355 360 365 

Gly Asp Leu Thr Leu Tyr Gin Thr Ala Asn Tyr Cys Tyr His Asp Ala 

370 375 380 

Arg Leu Thr Leu Glu Leu Ser Thr Phe Gly Asn Glu He Leu Met Asp 
385 390 395 400 

Leu Leu Val Val Thr Ser Arg He Ala Arg Met Pro He Asp Asp Met 

405 410 415 

Ser Arg Met Gly Val Ser Gin Trp He Arg Ser Leu Leu Tyr Tyr Glu 

420 425 430 

His Arg Gin Arg Asn Ala Leu He Pro Arg Arg Asp Glu Leu Glu Lys 

435 440 445 

Arg Ser Gin Gin Val Ser Asn Asp Ala Val He Lys Asp Lys Lys Phe 

450 455 460 

Arg Gly Gly Leu Val Val Glu Pro Glu Glu Gly He His Phe Asp Val 
465 470 475 480 

Thr Val Met Asp Phe Ala Ser Leu Tyr Pro Ser He He Lys Val Arg 
485 490 495 
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Asn Leu Ser Tyr Glu Thr Val Arg Cys Val His Pro Glu Cys Arg Lys 

500 505 510 

Aan Thr lie Pro Asp Thr Asn His Trp Val Cys Thr Lys Asn Asn Gly 

515 520 525 

Leu Thr Ser Met lie lie Gly Ser Leu Arg Asp Leu Arg Val Asn Tyr 

530 535 540 

Tyr Lys Ser Leu Ser Lys Ser Gin Ser lie Thr Glu Glu Gin Arg Gin 
545 550 555 560 

Gin Tyr Thr Val lie Ser Gin Ala Leu Lys Val Val Leu Asn Ala Ser 

565 570 575 

Tyr Gly Val Met Gly Ala Glu He Phe Pro Leu Tyr Phe Leu Pro Ala 

580 585 590 

Ala Glu Ala Thr Thr Ala Val Gly Arg Tyr He He Met Gin Thr He 

595 600 605 

Ser His Cys Glu Gin Met Gly Val Lys Val Leu Tyr Gly Asp Thr Asp 

610 615 620 

Ser Leu Phe He Lys Asn Pro Glu Glu Arg Gin He His Asp He Val 
625 630 635 640 

Glu His Ala Lys Lys Glu His Gly Val Glu Leu Glu Val Asp Lys Glu 

645 650 655 

Tyr Arg Tyr Val Val Leu Ser Asn Arg Lys Lys Asn Tyr Phe Gly Val 

660 665 670 

Thr Lys Ser Gly Lys Val Asp Val Lys Gly Leu Thr Gly Lys Lys Ser 

675 680 685 

His Thr Pro Pro Phe He Lys Glu Leu Phe Tyr Ser Leu Leu Asp He 

690 695 700 

Leu Ser Ala Val Gin Thr Glu Asp Glu Phe Glu Ser Ala Lys Leu Lys 
705 710 715 720 

He Ser Lys Ala He Ala Ala Ser Gly Lys Arg Leu Glu Glu Arg Gly 

725 730 735 

Val Pro Leu Ala Asp Leu Ala Phe Asn Val Met He Ser Lys Ala Pro 

740 745 750 

Ser Glu Tyr Val Lys Thr Val Pro Gin His He Arg Ala Ala Arg Leu 

755 760 765 

Leu Glu Asn Ala Arg Glu Val Lys Lys Gly Asp He He Ser Tyr Val 

770 775 780 

Lys Val Met Asn Lys Thr Gly Val Lys Pro Val Glu Met Ala Gin Ala 
785 790 795 800 

Gly Glu Val Asp Thr Ser Lys Tyr Leu Glu Phe Met Glu Ser Thr Leu 

805 810 815 

Asp Gin Leu Thr Ser Ser Met Gly Leu Asp Phe Asp Glu Met Leu Gly 

820 825 830 

Lys Pro Lys Gin Thr Gly Met Glu Gin Phe Phe Phe Lys 
835 840 845 

<210> 63 
<211> 642 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (642) 

<400> 63 

ttg ccc gtt atg tgt gcg gtc tec acg cgc ggc cct gac gcg gec tgt 48 
Met Pro Val Met Cys Ala Val Ser Thr Arg Gly Pro Asp Ala Ala Cys 
1 5 10 15 
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tgt ttt atg gtt teg tac acc ggg gca tat acc ata ata tgc egg gcg 96 
Cys Phe Met Val Ser Tyr Thr Gly Ala Tyr Thr He He Cys Arg Ala 
20 25 30 

gtg gca cca tgg ccg ttg age ggt ttt gag cgc ccg tec tgg gac gaa 144 
Val Ala Pro Trp Pro Leu Ser Gly Phe Glu Arg Pro Ser Trp Asp Glu 
35 40 45 

tat ttc atg ctg cag gcg gag ctg gca aag etc cga tec aac tgc atg 192 
Tyr Phe Met Leu Gin Ala Glu Leu Ala Lys Leu Arg Ser Asn Cys Met 
50 55 60 

gtc aga aag gtg ggg gee gtc ata gtc agg gat cac agg cag ctg gee 240 
Val Arg Lys Val Gly Ala Val He Val Arg Asp His Arg Gin Leu Ala 
65 70 75 80 

aca gga tac aac ggg acg ccc ccc ggc gta aag aac tgc ttc gag ggc 288 
Thr Gly Tyr Asn Gly Thr Pro Pro Gly Val Lys Asn Cys Phe Glu Gly 
85 90 95 

ggg tgc gaa agg tgc ata gag cgc atg gag ggc aag ate cgc tea ggc 336 
Gly Cys Glu Arg Cys He Glu Arg Met Glu Gly Lys He Arg Ser Gly 
100 105 110 

gag ggc ctg gac egg tgc ctg tgc aac cat gca gag gec aac gcg ata 384 
Glu Gly Leu Asp Arg Cys Leu Cys Asn His Ala Glu Ala Asn Ala He 
115 120 125 

atg cac tgt gcg ata ctg gga ata ggc gca ggg gga ggc aac gee acc 432 
Met His Cys Ala He Leu Gly He Gly Ala Gly Gly Gly Asn Ala Thr 
130 135 140 

atg tat acg acg ttc tct ccg tgt tta gag tgc aca aag atg gcg gtg 480 
Met Tyr Thr Thr Phe Ser Pro Cys Leu Glu Cys Thr Lys Met Ala Val 
145 150 155 160 

acc ata gga ate agg egg ttt gtc tgc ctg gat aca tat ccg gag aac 528 
Thr He Gly He Arg Arg Phe Val Cys Leu Asp Thr Tyr Pro Glu Asn 
165 170 175 

gec tec aag ctg gta aaa gat gca teg gee age ata acc atg atg gac 576 
Ala Ser Lys Leu Val Lys Asp Ala Ser Ala Ser He Thr Met Met Asp 
180 185 190 

aag gag aag ate aca tac tgg gcg tea agg atg ccc ggg gga aca aag 624 
Lys Glu Lys He Thr Tyr Trp Ala Ser Arg Met Pro Gly Gly Thr Lys 
195 200 205 

gag gtg ccg gtg cgc tga 642 
Glu Val Pro Val Arg * 
210 

<210> 64 
<211> 213 
<212> PRT 

<213> Cenarchaeum symbiosum 
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<400> 


64 








Met 


Pro 


Val 


Met 


Cys 


Ala Val Ser 


Thr Arg Gly Pro Asp Ala Ala Cys 


1 








5 




10 15 


Cys 


Phe 


Met 


Val 


Ser 


Tyr Thr Gly 


Ala Tyr Thr He He Cys Arg Ala 








20 






25 30 


Val 


Ala 


Pro 


Trp 


Pro 


Leu Ser Gly 


Phe Glu Arg Pro Ser Trp Asp Glu 






35 






40 


45 


Tyr 


Phe 


Met 


Leu 


Gin 


Ala Glu Leu 


Ala Lys Leu Arg Ser Asn Cys Met 




50 








55 


60 


val 


Arg 


Lys 


Val 


Gly 


Ala Val He 


Val Arg Asp His Arg Gin Leu Ala 


65 










70 


75 80 


Thr 


Gly Tyr 


Asn Gly 


Thr Pro Pro 


Gly Val Lys Asn Cys Phe Glu Gly 










85 




90 95 


Gly Cys Glu 


Arg 


Cys 


He Glu Arg 


Met Glu Gly Lys He Arg Ser Gly 








100 






105 110 


Glu 


Gly Leu 


Asp Arg 


Cys Leu Cys 


Asn His Ala Glu Ala Asn Ala He 






115 






120 


125 


Met 


His 


Cys 


Ala 


He 


Leu Gly He 


Gly Ala Gly Gly Gly Asn Ala Thr 




130 








135 


140 


Met 


Tyr 


Thr 


Thr 


Phe 


Ser Pro Cvs 


Leu Glu Cys Thr Lys Met Ala Val 


145 










150 


155 160 


Thr 


He 


Gly 


He 


Arg 


>ll y IT L1C VdJ. 
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165 




170 175 


Ala 


Ser 


Lys 


Leu 


Val 


Lys Asp Ala 


Ser Ala Ser He Thr Met Met Asp 








180 






185 190 


Lys 


Glu 


Lys 


He 


Thr 


Tyr Trp Ala 


Ser Arg Met Pro Gly Gly Thr Lys 






195 






200 


205 


Glu 


val 


Pro 


Val 


Arg 








210 













<210> 65 
<211> 1512 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 
<221> CDS 
<222> (1)... (15121 

<400> 65 
gtg gag acc gca cac ata acg ggc 
Met Glu Thr Ala His He Thr Gly 
1 5 

gag agg cgc gac tac cag gtg ggc 
Glu Arg Arg Asp Tyr Gin Val Gly 

20 

aac tgc ata gtg gtg ctg cct acc 
Asn Cys He Val Val Leu Pro Thr 
35 40 

ctg cag gtg ate tec cac tat ttg 
Leu Gin Val He Ser His Tyr Leu 
"50 55 



aaa tac gta gag ccc ggc gec gtc 48 
Lys Tyr Val Glu Pro Gly Ala Val 
10 15 

ctt gec gag cag gec ata egg gaa 96 
Leu Ala Glu Gin Ala He Arg Glu 
25 30 

ggc etc ggc aag acg gec gtg gee 144 
Gly Leu Gly Lys Thr Ala Val Ala 
45 

gac gaa ggc agg ggg get etc ttc 192 
Asp Glu Gly Arg Gly Ala Leu Phe 
60 



ctt gcg ccg aca agg gtg ctg gta aac cag cac cgc cag ttc ctg ggc 
Leu Ala Pro Thr Arg Val Leu Val Asn Gin His Arg Gin Phe Leu Gly 



240 



WO 00/18909 



-120- 



PCT/US99/22752 



65 70 75 80 

agg gcc ctt acc ata tec gat att acc ctg gtc aca ggc gag gac acc 288 
Arg Ala Leu Thr lie Ser Asp He Thr Leu Val Thr Gly Glu Asp Thr 
85 90 95 

gtc ccg agg cgc aaa aaa get tgg ggc ggc age gtg ate tgc gcc acc 336 
Val Pro Arg Arg Lys Lys Ala Trp Gly Gly Ser Val He Cys Ala Thr 
100 105 110 

ccc gag ata aca aga aac gac ata gcg cgc gga atg gtc ccg etc gaa 384 
Pro Glu lie Thr Arg Asn Asp He Ala Arg Gly Met Val Pro Leu Glu 
115 120 125 

cag ttc ggc ctg gtt gtg ttc gac gag gcc cac agg gcg gtg ggc gac 432 
Gin Phe Gly Leu Val Val Phe Asp Glu Ala His Arg Ala Val Gly Asp 
130 135 140 

tat gcc tat tec gca ata gcg cgt gca gtg ggg gag aac tct aga atg 480 
Tyr Ala Tyr Ser Ala He Ala Arg Ala Val Gly Glu Asn Ser Arg Met 
145 150 155 160 

ate ggc atg act gcg acc ctt cca age gag agg gag aaa gcc gac gag 528 
lie Gly Met Thr Ala Thr Leu Pro Ser Glu Arg Glu Lys Ala Asp Glu 
165 170 175 

ata atg ggc act ctt etc tea aag age ata gca caa agg acc gaa gac 576 
He Met Gly Thr Leu Leu Ser Lys Ser He Ala Gin Arg Thr Glu Asp 
180 185 190 

gac ccg gat gta aag ccc tac gtg cag gag acc gaa act gaa tgg ata 624 
Asp Pro Asp Val Lys Pro Tyr Val Gin Glu Thr Glu Thr Glu Trp He 
195 200 205 

aag gtg gag ctg ccc ccg gag atg aag gag ate caa aag etc ctg aag 672 
Lys Val Glu Leu Pro Pro Glu Met Lys Glu He Gin Lys Leu Leu Lys 
210 215 220 

atg gcc etc gac gaa aga tat gcg gcc etc aag agg tgc ggc tat gat 720 
Met Ala Leu Asp Glu Arg Tyr Ala Ala Leu Lys Arg Cys Gly Tyr Asp 
225 230 235 240 

etc ggc teg aac agg teg etc teg get ctg etc cgc ctt cgc atg gtc 768 
Leu Gly Ser Asn Arg Ser Leu Ser Ala Leu Leu Arg Leu Arg Met Val 
245 250 255 

gtt eta age ggc aac agg egg gcg gca aag cct ttg ttt act gcg ata 816 
Val Leu Ser Gly Asn Arg Arg Ala Ala Lys Pro Leu Phe Thr Ala He 
260 265 270 

cgc ate aca tac gcg etc aac ata ttc gag gcc cac ggg gtc acg ccg 864 
Arg He Thr Tyr Ala Leu Asn He Phe Glu Ala His Gly Val Thr Pro 
275 280 285 

ttt eta aag ttc tgc gag agg acc gtc aag aaa aag ggc gcc ggt gtt 912 
Phe Leu Lys Phe Cys Glu Arg Thr Val Lys Lys Lys Gly Ala Gly Val 
290 295 300 
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gca gag ctg ttc gag gag gac aga aac ttt aca ggg gcc atg gcg cgc 960 
Ala Glu Leu Phe Glu Glu Asp Arg Asn Phe Thr Gly Ala Met Ala Arg 
305 310 315 320 

gca aag gcg gcg cag gca gcc ggc atg gag cat cca aag ata cca aag 1008 
Ala Lys Ala Ala Gin Ala Ala Gly Met Glu His Pro Lys lie Pro Lys 
325 330 335 

ttg gaa gag get gtg cgc ggg gcc aaa ggg aag gcg ctg gtc ttt aca 1056 
Leu Glu Glu Ala Val Arg Gly Ala Lys Gly Lys Ala Leu Val Phe Thr 
340 345 350 

age tac agg gac tct gtc gat tta ata cac tea aag ctg cag get gcc 1104 
Ser Tyr Arg Asp Ser Val Asp Lieu lie His Ser Lys Leu Gin Ala Ala 
355 360 365 

ggg ata aac teg ggg ate etc ata gga aag gcg gga gaa aag ggc etc 1152 
Gly lie Asn Ser Gly lie Leu lie Gly Lys Ala Gly Glu Lys Gly Leu 
370 375 380 

aag cag aaa aaa cag gta gag act gtc gcc aag ttc cgc gac ggg gga 1200 
Lys Gin Lys Lys Gin Val Glu Thr Val Ala Lys Phe Arg Asp Gly Gly 
385 390 395 400 

tac gac gtg etc gta tct aca aga gtg ggc gag gag ggc etc gac ata 124 8 

Tyr Asp Val Leu Val Ser Thr Arg Val Gly Glu Glu Gly Leu Asp lie 
405 410 415 

teg gag gta aac ctt gtg gta ttc tat gac aat gtc cca age teg ata 1296 
Ser Glu Val Asn Leu Val Val Phe Tyr Asp Asn Val Pro Ser Ser lie 
420 425 430 

agg tat gtg cag aga agg ggc agg ace ggc agg aag gac gcg ggc aag 1344 
Arg Tyr Val Gin Arg Arg Gly Arg Thr Gly Arg Lys Asp Ala Gly Lys 
435 440 445 

ctg gtg gta ctg atg gca aag ggg act ata gac gag gca tac tac tgg 1392 
Leu Val Val Leu Met Ala Lys Gly Thr He Asp Glu Ala Tyr Tyr Trp 
450 455 460 

ata ggc egg cgc aag att act gcc gcc agg ggc atg ggg gac agg atg 144 0 

He Gly Arg Arg Lys He Thr Ala Ala Arg Gly Met Gly Asp Arg Met 
465 470 475 480 

aac aag teg ctt gca gcg ggg ggc cct gcg cca aag gca gcc cca aaa 1488 
Asn Lys Ser Leu Ala Ala Gly Gly Pro Ala Pro Lys Ala Ala Pro Lys 
485 490 495 

aag ggg etc gag ggc tat ttc tag 1512 
Lys Gly Leu Glu Gly Tyr Phe * 
500 

<210> 66 

<211> 503 

<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 66 
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Met Glu Thr Ala His lie Thr Gly Lys Tyr Val Glu Pro Gly Ala Val 

15 10 15 

Glu Arg Arg Asp Tyr Gin Val Gly Leu Ala Glu Gin Ala lie Arg Glu 

20 25 30 

Asn Cys lie Val Val Leu Pro Thr Gly Leu Gly Lys Thr Ala Val Ala 

35 40 45 

Leu Gin Val lie Ser His Tyr Leu Asp Glu Gly Arg Gly Ala Leu Phe 

50 55 60 

Leu Ala Pro Thr Arg Val Leu Val Asn Gin His Arg Gin Phe Leu Gly 
65 70 75 80 

Arg Ala Leu Thr He Ser Asp He Thr Leu Val Thr Gly Glu Asp Thr 

85 90 95 

Val Pro Arg Arg Lys Lys Ala Trp Gly Gly Ser Val He Cys Ala Thr 

100 105 110 

Pro Glu He Thr Arg Asn Asp He Ala Arg Gly Met Val Pro Leu Glu 

115 120 125 

Gin Phe Gly Leu Val Val Phe Asp Glu Ala His Arg Ala Val Gly Asp 

130 135 140 

Tyr Ala Tyr Ser Ala He Ala Arg Ala Val Gly Glu Asn Ser Arg Met 
145 150 155 160 

He Gly Met Thr Ala Thr Leu Pro Ser Glu Arg Glu Lys Ala Asp Glu 

165 170 175 

He Met Gly Thr Leu Leu Ser Lys Ser He Ala Gin Arg Thr Glu Asp 

180 185 190 

Asp Pro Asp Val Lys Pro Tyr Val Gin Glu Thr Glu Thr Glu Trp He 

195 200 205 

Lys Val Glu Leu Pro Pro Glu Met Lys Glu He Gin Lys Leu Leu Lys 

210 215 220 

Met Ala Leu Asp Glu Arg Tyr Ala Ala Leu Lys Arg Cys Gly Tyr Asp 
225 230 235 240 

Leu Gly Ser Asn Arg Ser Leu Ser Ala Leu Leu Arg Leu Arg Met Val 

245 250 255 

Val Leu Ser Gly Asn Arg Arg Ala Ala Lys Pro Leu Phe Thr Ala He 

260 265 270 

Arg He Thr Tyr Ala Leu Asn He Phe Glu Ala His Gly Val Thr Pro 

275 280 285 

Phe Leu Lys Phe Cys Glu Arg Thr Val Lys Lys Lys Gly Ala Gly Val 

290 295 300 

Ala Glu Leu Phe Glu Glu Asp Arg Asn Phe Thr Gly Ala Met Ala Arg 
305 310 315 320 

Ala Lys Ala Ala Gin Ala Ala Gly Met Glu His Pro Lys He Pro Lys 

325 330 335 

Leu Glu Glu Ala Val Arg Gly Ala Lys Gly Lys Ala Leu Val Phe Thr 

340 345 350 

Ser Tyr Arg Asp Ser Val Asp Leu He His Ser Lys Leu Gin Ala Ala 

355 360 365 

Gly He Asn Ser Gly He Leu He Gly Lys Ala Gly Glu Lys Gly Leu 

370 375 380 

Lys Gin Lys Lys Gin Val Glu Thr Val Ala Lys Phe Arg Asp Gly Gly 
385 390 395 400 

Tyr Asp Val Leu Val Ser Thr Arg Val Gly Glu Glu Gly Leu Asp He 

405 410 415 

Ser Glu Val Asn Leu Val Val Phe Tyr Asp Asn Val Pro Ser Ser He 

420 425 430 

Arg Tyr Val Gin Arg Arg Gly Arg Thr Gly Arg Lys Asp Ala Gly Lys 

435 440 445 

Leu Val Val Leu Met Ala Lys Gly Thr He Asp Glu Ala Tyr Tyr Trp 
450 455 460 
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lie Gly Arg Arg Lys lie Thr Ala Ala Arg Gly Met Gly Asp Arg Met 
465 470 475 480 

Asn Lys Ser Leu Ala Ala Gly Gly Pro Ala Pro Lys Ala Ala Pro Lys 

485 490 495 

Lys Gly Leu Glu Gly Tyr Phe 
500 



<210> 67 
<211> 279 
<212> DNA 

<213> Cenarchaeum symbiosum 



<220> 

<221> CDS 

<222> (1) . . . (279) 



<400> 67 

a tg gcg gac aag ata aag tgc teg cac ata ctg gta aaa aag cag ggc 4 8 

Met Ala Asp Lys lie Lys Cys Ser His lie Leu Val Lys Lys Gin Gly 
1 5 10 15 



gag gcg etc gca gtg caa gag cgc etc aag gcg ggc gaa aag ttt gga 96 
Glu Ala Leu Ala Val Gin Glu Arg Leu Lys Ala Gly Glu Lys Phe Gly 
20 25 30 



aag ctg gca aag gag etc teg ata gac ggg ggc age gca aag agg gac 144 

Lys Leu Ala Lys Glu Leu Ser He Asp Gly Gly Ser Ala Lys Arg Asp 
35 40 45 

ggc age ttg ggc tac ttt ggc agg ggc aag atg gta aag ccg ttt gag 192 

Gly Ser Leu Gly Tyr Phe Gly Arg Gly Lys Met Val Lys Pro Phe Glu 
50 55 60 



gat gee gcg ttc cgc ctg cag gta ggc gag gta tec gag ccg gta aaa 240 
Asp Ala Ala Phe Arg Leu Gin Val Gly Glu Val Ser Glu Pro Val Lys 
65 70 75 80 



tec gag ttt ggc tac cac gtg ata aag cgc ctg gga taa 279 
Ser Glu Phe Gly Tyr His Val He Lys Arg Leu Gly * 
85 90 



<210> 68 
<211> 92 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 68 



Met 


Ala 


Asp 


Lys 


He 


Lys 


Cys 


Ser His He Leu Val 


Lys 


Lys 


Gin Gly 


l 








5 






10 






15 




Glu 


Ala 


Leu 


Ala 


Val 


Gin 


Glu 


Arg Leu Lys Ala Gly Glu 


Lys 


Phe 


Gly 








20 








25 




30 






Lys 


Leu 


Ala 


Lys 


Glu 


Leu 


Ser 


He Asp Gly Gly Ser 


Ala 


Lys 


Arg 


Asp 






35 










40 


45 








Gly Ser 


Leu Gly 


Tyr 


Phe 


Gly Arg Gly Lys Met Val 


Lys 


Pro 


Phe 


Glu 




50 










55 


60 










Asp 


Ala 


Ala 


Phe 


Arg 


Leu 


Gin 


Val Gly Glu Val Ser 


Glu 


Pro 


Val 


Lys 



65 70 75 80 



WO 00/18909 



-124- 



PCT/US99/22752 



Ser Glu Phe Gly Tyr His Val lie Lys Arg Leu Gly 
B5 90 

<210> 69 
<211> 402 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1)...(402) 

<400> 69 

atg tct ttg tat ttt acg ata aag acg gcc aac ctg gcc ctg ccc gac 48 

Met Ser Leu Tyr Phe Thr lie Lys Thr Ala Asn Leu Ala Leu Pro Asp 
15 10 15 

gtg gta aag agg tac aac cac gtc ctg gcg tgc aag age gag gtg atg 96 
Val Val Lys Arg Tyr Asn His Val Leu Ala Cys Lys Ser Glu Val Met 
20 25 30 

a gg gcc gag aag cag ate cag gtg tec ate teg teg teg ggc ggt ctg 144 
Arg Ala Glu Lys Gin lie Gin Val Ser lie Ser Ser Ser Gly Gly Leu 
35 40 45 

gac aag tac gcg gag etc aag cag cag ttc aac teg agg ata acc gag 192 
Asp Lys Tyr Ala Glu Leu Lys Gin Gin Phe Asn Ser Arg lie Thr Glu 
50 55 60 

ttc tac cgc teg ata gag gag ctg gag aag acg ggc gtg gtg gtc aag 240 
Phe Tyr Arg Ser He Glu Glu Leu Glu Lys Thr Gly Val Val Val Lys 
65 70 75 80 

age ata gac gag ggg etc ctg gac ttt ccc gca aag cgc ttt ggg gac 288 
Ser He Asp Glu Gly Leu Leu Asp Phe Pro Ala Lys Arg Phe Gly Asp 
85 90 95 

gac ate tgg ctg tgc tgg aag gtg ggc gag cgc gag ate aag ttc tgg 336 
Asp He Trp Leu Cys Trp Lys Val Gly Glu Arg Glu He Lys Phe Trp 
100 105 110 

cat gaa aag gac teg ggg ttt gac gga aga aag ccc ata gag gta agt 364 
His Glu Lys Asp Ser Gly Phe Asp Gly Arg Lys Pro He Glu Val Ser 
115 120 125 

gac gag tea eta gtg tag 402 
Asp Glu Ser Leu Val * 
130 



<210> 70 
<211> 133 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 70 

Met Ser Leu Tyr Phe Thr He Lys Thr Ala Asn Leu Ala Leu Pro Asp 
15 10 15 
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Val 


Val 


Lys 


Arg 


Tyr 


Asn His 


Val 


Leu Ala 


Cys Lys Ser 


Glu val Met 








20 








25 




30 


Arg 


Ala 


Glu 


Lys 


Gin 


He Gin 


Val 


Ser He 


Ser Ser Ser 


Gly Gly Leu 






35 








40 




45 




Asp 


Lys 


Tyr Ala 


Glu 


Leu Lys 


Gin 


Gin Phe 


Asn Ser Arg 


He Thr Glu 




50 








55 






60 




Phe 


Tyr 


Arg 


Ser 


He 


Glu Glu 


Leu 


Glu Lys 


Thr Gly Val 


Val Val Lys 


65 










70 






75 


60 


Ser 


He 


Asp 


Glu 


Gly 


Leu Leu 


Asp 


Phe Pro 


Ala Lys Arg Phe Gly Asp 










85 






90 




95 


Asp 


He 


Trp 


Leu 


Cys 


Trp Lys 


Val 


Gly Glu Arg Glu lie 


Lys Phe Trp 








100 








105 




110 


His 


Glu 


Lys 


Asp 


Ser 


Gly Phe 


Asp 


Gly Arg 


Lys Pro He 


Glu Val Ser 






115 








120 




125 




Asp 


Glu 


Ser 


Leu 


Val 













130 



<210> 71 
<211> 879 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 
<221> CDS 
<222> (1} . . . (879) 

<400> 71 

atg etc tec tec tgg ctg cgc 

Met Leu Ser Ser Trp Leu Arg 
1 5 

teg gtg ata gee gta tea gcg 
Ser Val lie Ala Val Ser Ala 
20 

cac gga ata gac gcg etc aca 
His Gly He Asp Ala Leu Thr 
35 

get ctt cat gca age gtg gac 
Ala Leu His Ala Ser Val Asp 
50 55 

cgc ggc ata gat acg aga acc 
Arg Gly He Asp Thr Arg Thr 
65 70 

999 gtg ctg cca gag ggc ctg 
Gly Val Leu Pro Glu Gly Leu 
85 

ggc ate ata tea ctg gtg etc 
Gly He He Ser Leu Val Leu 
100 

ate aca acg ggg ccc gtc ata 
He Thr Thr Gly Pro Val He 
115 



gta ata cgc gtc egg ttc ctg etc gcg 48 
Val He Arg Val Arg Phe Leu Leu Ala 
10 15 

ggc ctt gee etc tec tgg tgg cac ggc 96 
Gly Leu Ala Leu Ser Trp Trp His Gly 
25 30 

gcg gca etc acc atg gec gga gtg gee 144 
Ala Ala Leu Thr Met Ala Gly Val Ala 
40 45 

atg etc aac gac tac tgg gac tac aag 192 
Met Leu Asn Asp Tyr Trp Asp Tyr Lys 
60 

aag agg acc ccg atg age ggg ggg aca 240 
Lys Arg Thr Pro Met Ser Gly Gly Thr 
75 80 

ctg age ccc cgc cag gtg tac cgc gee 288 
Leu Ser Pro Arg Gin Val Tyr Arg Ala 
90 95 

ggg act gee gec ggc gca tac ttt gtg 336 
Gly Thr Ala Ala Gly Ala Tyr Phe Val 
105 110 

get gcg ata etc ggc ttt gcg gtg gtc 384 
Ala Ala He Leu Gly Phe Ala Val Val 
120 125 
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tcg att tac ttt tac teg aca agg att gtg gac teg ggc etc tec gag 
Ser lie Tyr Phe Tyr Ser Thr Arg lie Val Asp Ser Gly Leu Ser Glu 
130 135 140 



432 



gtg etc gtc ggg gtc aag ggg gcg atg ate gtc ctt ggc gee tac tac 
Val Leu Val Gly Val Lys Gly Ala Met He Val Leu Gly Ala Tyr Tyr 
145 150 155 160 



480 



ata cag gcg ccc gag ate acg ccg gee gee etc etc gtc ggc gcg gca 
He Gin Ala Pro Glu He Thr Pro Ala Ala Leu Leu Val Gly Ala Ala 
165 170 175 



528 



gtg ggg gcg ctg tea tct gcg gtc etc ttt gtg gcg teg ttt ccg gac 
Val Gly Ala Leu Ser Ser Ala Val Leu Phe Val Ala Ser Phe Pro Asp 
180 185 190 



576 



cac gac gca gac aag gag cgc ggc aga aaa acg ctg gtg ata ata ctg 
His Asp Ala Asp Lys Glu Arg Gly Arg Lys Thr Leu Val He lie Leu 
195 200 205 



624 



ggc aaa aag agg gee teg cgc ata etc tgg gtc ttt cca get gtg gcg 
Gly Lys Lys Arg Ala Ser Arg lie Leu Trp Val Phe Pro Ala Val Ala 
210 215 220 



672 



tat tea tec gtg ata gcg ggg gtg att ate cag gtg ctg cca gtg tac 
Tyr Ser Ser Val lie Ala Gly Val He He Gin Val Leu Pro Val Tyr 
225 230 235 240 



720 



tec etc gee atg ctg ctt gec gee ccc ctt gcg gca ata teg gca agg 
Ser Leu Ala Met Leu Leu Ala Ala Pro Leu Ala Ala He Ser Ala Arg 
245 250 255 



768 



ggc ctt gec aaa gag tat gac ggg gac agg ate ata egg gtc atg cgc 
Gly Leu Ala Lys Glu Tyr Asp Gly Asp Arg He He Arg Val Met Arg 
260 265 270 



816 



ggc acg ctg egg ttc age agg act gca ggc gcg ctg ctg gtg ctg gga 
Gly Thr Leu Arg Phe Ser Arg Thr Ala Gly Ala Leu Leu Val Leu Gly 
275 280 285 



864 



ata ctg ctt ggt tga 
He Leu Leu Gly * 
290 



879 



<210> 72 
<211> 292 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 72 

Met Leu Ser Ser Trp Leu Arg Val He Arg Val Arg Phe Leu Leu Ala 

15 10 15 

Ser Val lie Ala Val Ser Ala Gly Leu Ala Leu Ser Trp Trp His Gly 

20 25 30 

His Gly lie Asp Ala Leu Thr Ala Ala Leu Thr Met Ala Gly Val Ala 
35 40 45 
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Ala Leu His Ala Ser Val Asp Met Leu Asn Asp Tyr Trp Asp Tyr Lys 

50 55 60 

Arg Gly He Asp Thr Arg Thr Lys Arg Thr Pro Met Ser Gly Gly Thr 
65 70 75 80 

Gly Val Leu Pro Glu Gly Leu Leu Ser Pro Arg Gin Val Tyr Arg Ala 

85 90 95 

Gly He He Ser Leu Val Leu Gly Thr Ala Ala Gly Ala Tyr Phe Val 

100 105 HO 

He Thr Thr Gly Pro Val He Ala Ala He Leu Gly Phe Ala Val Val 

115 120 125 

Ser He Tyr Phe Tyr Ser Thr Arg He Val Asp Ser Gly Leu Ser Glu 

130 135 140 

Val Leu Val Gly Val Lys Gly Ala Met He Val Leu Gly Ala Tyr Tyr 
145 150 155 160 

He Gin Ala Pro Glu He Thr Pro Ala Ala Leu Leu Val Gly Ala Ala 

165 170 175 

Val Gly Ala Leu Ser Ser Ala Val Leu Phe Val Ala Ser Phe Pro Asp 

180 185 190 

His Asp Ala Asp Lys Glu Arg Gly Arg Lys Thr Leu Val He He Leu 

195 200 205 

Gly Lys Lys Arg Ala Ser Arg He Leu Trp Val Phe Pro Ala Val Ala 

210 215 220 

Tyr Ser Ser Val He Ala Gly Val He He Gin Val Leu Pro Val Tyr 
225 230 235 240 

Ser Leu Ala Met Leu Leu Ala Ala Pro Leu Ala Ala He Ser Ala Arg 

245 250 255 

Gly Leu Ala Lys Glu Tyr Asp Gly Asp Arg He He Arg Val Met Arg 

260 265 270 

Gly Thr Leu Arg phe Ser Arg Thr Ala Gly Ala Leu Leu Val Leu Gly 

275 280 285 

He Leu Leu Gly 
290 

<210> 73 
<211> 1227 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1} . . . (1227) 

<400> 73 

ttg agg ccc gcg get gtg cct aca gca egg gat att ggc gca gaa egg 48 

Met Arg Pro Ala Ala Val Pro Thr Ala Arg Asp He Gly Ala Glu Arg 
15 10 15 

ggc aat etc aca ctt tgt ace ctt cat aca cat aaa tec cgc ttg gat 96 
Gly Asn Leu Thr Leu Cys Thr Leu His Thr His Lys Ser Arg Leu Asp 
20 25 30 

gtg egg ctg cgc atg ate age ggg cat gec acg gee gag ggt aca cag 144 
Val Arg Leu Arg Met He Ser Gly His Ala Thr Ala Glu Gly Thr Gin 
35 40 45 

agg ata gee gag atg tec ggc gca cac cat gac aac tac aag gtg gta 192 
Arg He Ala Glu Met Ser Gly Ala His His Asp Asn Tyr Lys Val Val 
50 55 60 
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gac ggg ctg cac etc tec aac gtg ggg atg ggc acc tac ctt ggc gac 240 
Asp Gly Leu His Leu Ser Asn Val Gly Met Gly Thr Tyr Leu Gly Asp 
65 70 75 80 

gcg gat gac gec acc gac agg gec gtc aca gac gcg gtc aag agg tea 288 
Ala Asp Asp Ala Thr Asp Arg Ala Val Thr Asp Ala Val Lys Arg Ser 
85 90 95 

ate aag teg ggg ata aac gtc ata gat acc gcg ata aac tac cgc etc 336 
lie Lys Ser Gly lie Asn Val lie Asp Thr Ala lie Asn Tyr Arg Leu 
100 105 110 

cag agg gee gag cgt tec gtg ggc agg gec gtt aca gag etc tea gag 384 
Gin Arg Ala Glu Arg Ser Val Gly Arg Ala Val Thr Glu Leu Ser Glu 
115 120 125 

gag ggg ctg gta tec agg gac cag ata ttc ata tec aca aag gcg gga 432 
Glu Gly Leu Val Ser Arg Asp Gin lie Phe lie Ser Thr Lys Ala Gly 
130 135 140 

tac gtg acc aac gat tea gag gtc tec etc gac ttt tgg gag tat gta 480 
Tyr Val Thr Asn Asp Ser Glu Val Ser Leu Asp Phe Trp Glu Tyr Val 
145 150 155 160 

aaa aag gaa tac gtc ggt ggc ggc gtc ata cag tec ggg gac ata tec 528 
Lys Lys Glu Tyr Val Gly Gly Gly Val lie Gin Ser Gly Asp Tie Ser 
165 170 175 

teg gga tac cac tgc atg aag ccc gcg tat eta gag gac cag eta aag 576 
Ser Gly Tyr His Cys Met Lys Pro Ala Tyr Leu Glu Asp Gin Leu Lys 
180 185 190 

aga age ctt gca aac atg aac gtc gac tgc ata gat ctt gtc tac gtg 624 
Arg Ser Leu Ala Asn Met Asn Val Asp Cys lie Asp Leu Val Tyr Val 
195 200 205 

cac aac ccg gtg gag ggg cag ate aag gac cgc ccc gtg ccg gag ate 672 
His Asn Pro Val Glu Gly Gin lie Lys Asp Arg Pro Val Pro Glu lie 
210 215 220 

etc gag ggg ata ggc gag gee ttt gee atg tac gag aaa atg egg gag 720 
Leu Glu Gly lie Gly Glu Ala Phe Ala Met Tyr Glu Lys Met Arg Glu 
225 230 235 240 

get ggc cgc ata agg tat tac ggg etc gee acg tgg gag tgc ttc egg 768 
Ala Gly Arg lie Arg Tyr Tyr Gly Leu Ala Thr Trp Glu Cys Phe Arg 
245 250 255 

gtc gca gag ggc gac ccg cag age atg cag etc gaa gca gtg gta aaa 816 
Val Ala Glu Gly Asp Pro Gin Ser Met Gin Leu Glu Ala Val Val Lys 
260 265 270 

aag gec aag gat gec ggc ggg gag aac cac ggc ttt agg ttc ata cag 864 
Lys Ala Lys Asp Ala Gly Gly Glu Asn His Gly Phe Arg Phe lie Gin 
275 280 285 



ctg cca ttc aac cag tac ttt gac cag gec tac atg gta aag aac cag 



912 



WO 00/18909 



-129- 



PCT/US99/22752 



Leu Pro Phe Asn Gin Tyr Phe Asp Gin Ala Tyr Met Val Lys Asn Gin 
290 295 300 

ggg acg ggc ggc ggc aag tea tec ata ctg gag gcg gca gec gcg ctg 960 
Gly Thr Gly Gly Gly Lys Ser Ser He Leu Glu Ala Ala Ala Ala Leu 
305 310 315 320 

gac att ggc gtg ttc aca age gtc ccg ttc atg cag ggc aag ctg etc 1008 
Asp He Gly Val Phe Thr Ser Val Pro Phe Met Gin Gly Lys Leu Leu 
325 330 335 

gag cct ggc ctg ctg ccg gag ttt ggc ggg etc teg ccc gec ctg egg 1056 
Glu Pro Gly Leu Leu Pro Glu Phe Gly Gly Leu Ser Pro Ala Leu Arg 
340 345 350 

tec ctg cag ttc ate agg tct aca ccg gga gtg ctt gec ccc ctg ccg 1104 
Ser Leu Gin Phe He Arg Ser Thr Pro Gly Val Leu Ala Pro Leu Pro 
355 360 365 

ggg cac aag tec age ctg cat aca gac gag aac eta aag ate atg ggc 1152 
Gly His Lys Ser Ser Leu His Thr Asp Glu Asn Leu Lys He Met Gly 
370 375 380 

gtg ccc ccc att cct cct gac aag ttc ggg gag ctt gtg gee age ctt 1200 
Val Pro Pro He Pro Pro Asp Lys Phe Gly Glu Leu Val Ala Ser Leu 
385 390 395 400 

acc tea tgg teg ccc ggc cag aaa tag 1227 
Thr Ser Trp Ser Pro Gly Gin Lys * 
405 



<210> 74 
<211> 408 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 74 

Met Arg Pro Ala Ala Val Pro Thr Ala Arg Asp He Gly Ala Glu Arg 

1 5 10 15 

Gly Asn Leu Thr Leu Cys Thr Leu His Thr His Lys Ser Arg Leu Asp 

20 25 30 

Val Arg Leu Arg Met He Ser Gly His Ala Thr Ala Glu Gly Thr Gin 

35 40 45 

Arg He Ala Glu Met Ser Gly Ala His His Asp Asn Tyr Lys Val Val 

50 55 60 

Asp Gly Leu His Leu Ser Asn Val Gly Met Gly Thr Tyr Leu Gly Asp 
65 70 75 80 

Ala Asp Asp Ala Thr Asp Arg Ala Val Thr Asp Ala Val Lys Arg Ser 

85 90 95 

He Lys Ser Gly He Asn Val He Asp Thr Ala He Asn Tyr Arg Leu 

100 105 110 

Gin Arg Ala Glu Arg Ser Val Gly Arg Ala Val Thr Glu Leu Ser Glu 

115 120 125 

Glu Gly Leu Val Ser Arg Asp Gin He Phe He Ser Thr Lys Ala Gly 

130 135 140 

Tyr Val Thr Asn Asp Ser Glu Val Ser Leu Asp Phe Trp Glu Tyr Val 
145 150 155 160 
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Lys Lys Glu Tyr Val Gly Gly Gly Val He Gin Ser Gly Asp He Ser 

165 170 175 

Ser Gly Tyr His Cys Met Lys Pro Ala Tyr Leu Glu Asp Gin Leu Lys 

180 1B5 190 

Arg Ser Leu Ala Asn Met Asn Val Asp Cys He Asp Leu Val Tyr Val 

195 200 205 

His Asn Pro Val Glu Gly Gin He Lys Asp Arg Pro Val Pro Glu He 

210 215 220 

Leu Glu Gly He Gly Glu Ala Phe Ala Met Tyr Glu Lys Met Arg Glu 
225 230 235 240 

Ala Gly Arg He Arg Tyr Tyr Gly Leu Ala Thr Trp Glu Cys Phe Arg 

245 250 255 

Val Ala Glu Gly Asp Pro Gin Ser Met Gin Leu Glu Ala Val Val Lys 

260 265 270 

Lys Ala Lys Asp Ala Gly Gly Glu Asn His Gly Phe Arg Phe He Gin 

275 280 285 

Leu Pro Phe Asn Gin Tyr Phe Asp Gin Ala Tyr Met Val Lys Asn Gin 

290 295 300 

Gly Thr Gly Gly Gly Lys Ser Ser He Leu Glu Ala Ala Ala Ala Leu 
305 310 315 320 

Asp He Gly Val Phe Thr Ser Val Pro Phe Met Gin Gly Lys Leu Leu 

325 330 335 

Glu Pro Gly Leu Leu Pro Glu Phe Gly Gly Leu Ser Pro Ala Leu Arg 

340 345 350 

Ser Leu Gin Phe He Arg Ser Thr Pro Gly Val Leu Ala Pro Leu Pro 

355 360 365 

Gly His Lys Ser Ser Leu His Thr Asp Glu Asn Leu Lys He Met Gly 

370 375 380 

Val Pro Pro He Pro Pro Asp Lys Phe Gly Glu Leu Val Ala Ser Leu 
385 390 395 400 

Thr Ser Trp Ser Pro Gly Gin Lys 
405 



<210> 75 
<211> 1077 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 
<221> CDS 

<222> (1) . . . (1077) 



<400> 75 

atg aac aac egg ttc cag gtt ate egg ggg gat gec egg gcg gtg ctg 48 

Met Asn Asn Arg Phe Gin Val He Arg Gly Asp Ala Arg Ala Val Leu 
1 5 10 15 



ccc agg ctt gca aaa aag aat ggc gag cgc ggc agg tac agg ctg gee 96 
Pro Arg Leu Ala Lys Lys Asn Gly Glu Arg Gly Arg Tyr Arg Leu Ala 
20 25 30 



gtc act tec ccc ccg tat tac ggg cac aga aag tac ggg teg gat ccc 144 
Val Thr Ser Pro Pro Tyr Tyr Gly His Arg Lys Tyr Gly Ser Asp Pro 
35 40 45 



tec gag ctg ggc cag gag ggg acg cct gat gag ttc gtc gag gag ctg 192 
Ser Glu Leu Gly Gin Glu Gly Thr Pro Asp Glu Phe Val Glu Glu Leu 
50 55 60 
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gca ggg gtg ttc aag age tgc atg gac ctg ctt acc gac gac ggc age 240 
Ala Gly Val Phe Lys Ser Cys Met Asp Leu Leu Thr Asp Asp Gly Ser 
65 70 75 80 

etc ttc ata gtg ata ggc gac acc egg agg egg cgc egg aag ctg atg 288 
Leu Phe lie Val lie Gly Asp Thr Arg Arg Arg Arg Arg Lys Leu Met 
85 90 95 

gtc ccg cac egg etc gcg etc aga ctt gta gac ctt ggg tac cac ttt 336 
Val Pro His Arg Leu Ala Leu Arg Leu Val Asp Leu Gly Tyr His Phe 
100 105 110 

caa gag gat ata gtc tgg tac aag aaa aac gcg eta tea cag age teg 384 
Gin Glu Asp lie val Trp Tyr Lys Lys Asn Ala Leu Ser Gin Ser Ser 
115 120 125 

aag cag aac ctt acg cag gcg tac gag ttt gtg ctg gtg eta tea aag 432 
Lys Gin Asn Leu Thr Gin Ala Tyr Glu Phe Val Leu Val Leu Ser Lys 
130 135 140 

teg gaa tec ccc gee ttt gac ata gac ccg ata cgc gtc cag ggc aac 480 
Ser Glu Ser Pro Ala Phe Asp lie Asp Pro lie Arg Val Gin Gly Asn 
145 150 155 160 

gag gee ctg age ggg gtc aac agg aag ccg gag cgc gac egg ctg cag 528 
Glu Ala Leu Ser Gly Val Asn Arg Lys Pro Glu Arg Asp Arg Leu Gin 
165 170 175 

ttc tec ccc ggg agg agg gac cct gaa gec ata ggg agg att gca gca 576 
Phe Ser Pro Gly Arg Arg Asp Pro Glu Ala lie Gly Arg lie Ala Ala 
180 185 190 

gtg ata cac ggc teg tec ccc gag acg ccg ttt gac gag ctg cca acc 624 
Val lie His Gly Ser Ser Pro Glu Thr Pro Phe Asp Glu Leu Pro Thr 
195 200 205 

acc gag gag ata teg egg gee cac ggg tat gac ccc gaa aag cac tgc 672 
Thr Glu Glu He Ser Arg Ala His Gly Tyr Asp Pro Glu Lys His Cys 
210 215 220 

ccg aca tgc tac cgc aag ttc aaa agg cat gcg acg cgc aag egg ata 720 
Pro Thr Cys Tyr Arg Lys Phe Lys Arg His Ala Thr Arg Lys Arg He 
225 230 235 240 

ggg ggc cac gag cac tat ccg ata ttt gca gca tgc aac ccc egg ggc 768 
Gly Gly His Glu His Tyr Pro He Phe Ala Ala Cys Asn Pro Arg Gly 
245 250 255 

aag aac cct ggg aac gtc tgg gag ata tec aca aag gcg cac cac ggc 816 
Lys Asn Pro Gly Asn Val Trp Glu He Ser Thr Lys Ala His His Gly 
260 265 270 

aac gag cac ttt gcg gtg ttc cca gaa gac etc gta tec egg ata gta 864 
Asn Glu His Phe Ala Val Phe Pro Glu Asp Leu Val Ser Arg He Val 
275 280 285 



aag ttt gec aca aga gag ggc gac tat gtg ctg gat ccg ttt gcg gga 



912 
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Lys Phe Ala Thr Arg Glu Gly Asp Tyr 
290 295 

a ET9 ggc aca acg ggg ata gtc teg gcg 
Arg Gly Thr Thr Gly He Val Ser Ala 
305 310 

gga ata gac ctg tat cct gec aac gtg 
Gly He Asp Leu Tyr Pro Ala Asn Val 
325 

aaa gat tct gcg gac teg aag ctg cca 
Lys Asp Ser Ala Asp Ser Lys Leu Pro 
340 345 

atg ccc gag gga aca cgc tga 
Met Pro Glu Gly Thr Arg * 
355 



Val Leu Asp Pro Phe Ala Gly 
300 

tgc etc aag agg ggc ttt acg 960 
Cys Leu Lys Arg Gly Phe Thr 
315 320 

gac agg ace egg cgc aat gtg 1008 
Asp Arg Thr Arg Arg Asn Val 
330 335 

aaa aag gtg eta gac cag ata 1056 
Lys Lys Val Leu Asp Gin He 
350 

1077 



<210> 76 
<211> 358 
<212> PRT 

<213> Cenarchaeum symbiosum 



<400> 76 



Met 


Asn 


Asn Arg 


Phe 


Gin 


Val 


He 


Arg Gly Asp Ala Arg Ala Val Leu 


1 








5 








10 15 


Pro 


Arg 


Leu 


Ala 


Lys 


Lys 


Asn 


Gly 


Glu Arg Gly Arg Tyr Arg Leu Ala 








20 










25 30 


Val 


Thr 


Ser 


Pro 


Pro 


Tyr 


Tyr 


Gly 


His Arg Lys Tyr Gly Ser Asp Pro 






35 










40 


45 


Ser 


Glu 


Leu Gly 


Gin 


Glu Gly 


Thr 


Pro Asp Glu Phe Val Glu Glu Leu 




50 










55 




60 


Ala 


Gly 


Val 


Phe 


Lys 


Ser 


Cys 


Met 


Asp Leu Leu Thr Asp Asp Gly Ser 


65 










70 






75 80 


Leu 


Phe 


He 


Val 


He 


Gly Asp 


Thr 


Arg Arg Arg Arg Arg Lys Leu Met 










85 








90 95 


Val 


Pro 


His Arg 


Leu 


Ala 


Leu 


Arg 


Leu Val Asp Leu Gly Tyr His Phe 








100 










105 110 


Gin 


Glu 


Asp 


He 


Val 


Trp 


Tyr 


Lys 


Lys Asn Ala Leu Ser Gin Ser Ser 






115 










120 


125 


Lys 


Gin 


Asn 


Leu 


Thr 


Gin 


Ala 


Tyr 


Glu Phe Val Leu Val Leu Ser Lys 




130 










135 




140 


Ser 


Glu 


Ser 


Pro 


Ala 


Phe 


Asp 


He 


Asp Pro He Arg Val Gin Gly Asn 


145 










150 






155 160 


Glu 


Ala 


Leu 


Ser 


Gly 


Val 


Asn 


Arg 


Lys Pro Glu Arg Asp Arg Leu Gin 










165 








170 175 


Phe 


Ser 


Pro Gly 


Arg 


Arg 


Asp 


Pro 


Glu Ala He Gly Arg He Ala Ala 








180 










185 190 


Val 


He 


His 


Gly 


Ser 


Ser 


Pro 


Glu 


Thr Pro Phe Asp Glu Leu Pro Thr 






195 










200 


205 


Thr 


Glu 


Glu 


He 


Ser 


Arg 


Ala 


His 


Gly Tyr Asp Pro Glu Lys His Cys 




210 










215 




220 


Pro 


Thr 


Cys 


Tyr 


Arg 


Lys 


Phe 


Lys 


Arg His Ala Thr Arg Lys Arg He 


225 










230 






235 240 


Gly Gly 


His 


Glu 


His 


Tyr 


Pro 


He 


Phe Ala Ala Cys Asn Pro Arg Gly 










245 








250 255 
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Lys Asn Pro Gly Asn Val Trp Glu lie Ser Thr Lys Ala His His Gly 

260 265 270 

Asn Glu His Phe Ala Val Phe Pro Glu Asp Leu Val Ser Arg lie Val 

275 280 285 

Lys Phe Ala Thr Arg Glu Gly Asp Tyr Val Leu Asp Pro Phe Ala Gly 

290 295 300 

Arg Gly Thr Thr Gly He Val Ser Ala Cys Leu Lys Arg Gly Phe Thr 
305 310 315 320 

Gly He Asp Leu Tyr Pro Ala Asn Val Asp Arg Thr Arg Arg Asn Val 

325 330 335 

Lys Asp Ser Ala Asp Ser Lys Leu Pro Lys Lys Val Leu Asp Gin He 

340 345 350 

Met Pro Glu Gly Thr Arg 
355 

<210> 77 
<211> 468 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> (1) . . . (468) 

<400> 77 

atg egg ctg ccc egg cgc cga ctt aaa ate gtt gta gga tgc ggc gee 48 
Met Arg Leu Pro Arg Arg Arg Leu Lys He Val Val Gly Cys Gly Ala 
15 10 15 

gca gat gca ttg ccc gee tta tac acc gee egg gat egg ccg cct tgc 96 
Ala Asp Ala Leu Pro Ala Leu Tyr Thr Ala Arg Asp Arg Pro Pro Cys 
20 25 30 

age aca cgc agt ata aac ggg ggc ccg ggc ggc gcg tat cac atg tgg 144 
Ser Thr Arg Ser He Asn Gly Gly Pro Gly Gly Ala Tyr His Met Trp 
35 40 45 

ata aag gac gaa ttc etc ggc ccg ggc aac aag atg agg ctg etc tac 192 
He Lys Asp Glu Phe Leu Gly Pro Gly Asn Lys Met Arg Leu Leu Tyr 
50 55 60 

ctg ata ctg ccc ate tat ggg tat ate ttt ctg gag tac tat ccg ttc 240 
Leu He Leu Pro He Tyr Gly Tyr He Phe Leu Glu Tyr Tyr Pro Phe 
65 70 75 80 

ttt ccc tgg atg gee acc tac tgg tgg tea gta get etc age ccc ccg 288 
Phe Pro Trp Met Ala Thr Tyr Trp Trp Ser Val Ala Leu Ser Pro Pro 
85 90 95 

ata gtg ccc acg cat tat gee ggg gag gee ctg ggg egg ctg ate ggg 336 
He Val Pro Thr His Tyr Ala Gly Glu Ala Leu Gly Arg Leu He Gly 
100 105 110 

gat cac gta ttg ttt ggc ate acc aca aag tac gtc tat gcg gca ata 384 
Asp His Val Leu Phe Gly He Thr Thr Lys Tyr Val Tyr Ala Ala He 
115 120 125 

tgg etc ggc atg gee cat ggg ata ate ctg ctg gca ggg cgc etc egg 432 
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Trp Leu Gly Met Ala His Gly lie lie Leu Leu Ala Gly Arg Leu Arg 
130 135 140 

gga cct agg cag gcg cca egg acg ggc ate cca tag 468 
Gly Pro Arg Gin Ala Pro Arg Thr Gly lie Pro * 
145 150 155 



<210> 78 
<211> 155 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 78 

Met Arg Leu Pro Arg Arg Arg Leu Lys lie Val Val Gly Cys Gly Ala 

15 10 15 

Ala Asp Ala Leu Pro Ala Leu Tyr Thr Ala Arg Asp Arg Pro Pro Cys 

20 25 30 

Ser Thr Arg Ser lie Asn Gly Gly Pro Gly Gly Ala Tyr His Met Trp 

35 40 45 

lie Lys Asp Glu Phe Leu Gly Pro Gly Asn Lys Met Arg Leu Leu Tyr 

50 55 60 

Leu lie Leu Pro He Tyr Gly Tyr He Phe Leu Glu Tyr Tyr Pro Phe 
65 70 75 80 

Phe Pro Trp Met Ala Thr Tyr Trp Trp Ser Val Ala Leu Ser Pro Pro 

85 90 95 

He Val Pro Thr His Tyr Ala Gly Glu Ala Leu Gly Arg Leu He Gly 

100 105 HO 

Asp His Val Leu Phe Gly He Thr Thr Lys Tyr Val Tyr Ala Ala He 

115 120 125 

Trp Leu Gly Met Ala His Gly He He Leu Leu Ala Gly Arg Leu Arg 

130 135 140 

Gly Pro Arg Gin Ala Pro Arg Thr Gly He Pro 
145 150 155 

<210> 79 
<211> 1779 
<212> DMA 

<213> Cenarchaeum symbiosum 

<220> 

<221> CDS 

<222> {1) . . . (1779) 

<400> 79 

ttg aag ctg caa ggc aag act gec gtg ate acc ggc agt ggt acc ggg 48 
Met Lys Leu Gin Gly Lys Thr Ala Val He Thr Gly Ser Gly Thr Gly 
15 10 15 

ate ggg ctg gcg gtg gca agg aaa ttt gee gag aac ggg gec age gtg 96 
He Gly Leu Ala Val Ala Arg Lys Phe Ala Glu Asn Gly Ala Ser Val 
20 25 30 

gta ata etc gga agg aga aag gag ccc etc gat gag gca gca gca gag 144 
Val He Leu Gly Arg Arg Lys Glu Pro Leu Asp Glu Ala Ala Ala Glu 
35 40 45 



etc aaa aag ata gcg gaa tct gca ggc tgc ggg gee teg ate agg ata 192 
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Leu Lys Lys lie Ala Glu Ser Ala Gly Cys Gly Ala Ser He Arg He 
50 55 60 

ttc gcc ggg gtg gac gtg gcc gac gaa tec gcg ata acg aaa atg ttc 240 
Phe Ala Gly Val Asp Val Ala Asp Glu Ser Ala He Thr Lys Met Phe 
65 70 75 80 

gac gag ctg tec age tea ggt gta acc gtg gac ata ctg gtg aac aat 288 
Asp Glu Leu Ser Ser Ser Gly Val Thr Val Asp He Leu Val Asn Asn 
85 90 95 

gcc ggc gtg teg ggg ccc gtc acg tgc ttt gcc aac aat gat eta gaa 336 
Ala Gly Val Ser Gly Pro Val Thr Cys Phe Ala Asn Asn Asp Leu Glu 
100 105 110 

gag ttc cgc ggg gca gtc gac ata cac ctg acc ggc tec ttc tgg aca 384 
Glu Phe Arg Gly Ala Val Asp He His Leu Thr Gly Ser Phe Trp Thr 
115 120 125 

teg agg gag gcc etc aag gtc atg aaa aag ggc tec aag att gtc acc 432 
Ser Arg Glu Ala Leu Lys Val Met Lys Lys Gly Ser Lys He Val Thr 
130 135 140 

atg act acg ttt ttt gca gaa gag agg cca etc gag cag agg ccg tac 480 
Met Thr Thr Phe Phe Ala Glu Glu Arg Pro Leu Glu Gin Arg Pro Tyr 
145 150 155 160 

agg ttc cgc gac ccg tat aca acc gca cag ggc gca aag aac agg etc 52 8 

Arg Phe Arg Asp Pro Tyr Thr Thr Ala Gin Gly Ala Lys Asn Arg Leu 
165 170 175 

gcc gag gcg atg teg tgg gat ctt tta gac cgc ggg ata aca teg ata 576 
Ala Glu Ala Met Ser Trp Asp Leu Leu Asp Arg Gly He Thr Ser He 
180 185 190 

gcg acc aac ccc ggc ccc gtc cat tct gac agg ata tac aag acg gta 624 
Ala Thr Asn Pro Gly Pro Val His Ser Asp Arg He Tyr Lys Thr Val 
195 200 205 

tac ccg agg gcg gca etc gag ttt gtc agg gtt tea ggg ttt gag gac 672 
Tyr Pro Arg Ala Ala Leu Glu Phe Val Arg Val Ser Gly Phe Glu Asp 
210 215 220 

ctg cag cca gaa gaa gtc gag gtg gca ggc ggc agg eta ate cac ctg 720 
Leu Gin Pro Glu Glu Val Glu Val Ala Gly Gly Arg Leu He His Leu 
225 230 235 240 

etc ggc gcg gac gac gat gca aga aaa aaa ggc ata gca gag gcc gca 768 
Leu Gly Ala Asp Asp Asp Ala Arg Lys Lys Gly He Ala Glu Ala Ala 
245 250 255 

gag cac ttt gcc aag eta aag ccc gtg gat ccc gca aag eta gag gcc 816 
Glu His Phe Ala Lys Leu Lys Pro Val Asp Pro Ala Lys Leu Glu Ala 
260 265 270 



acc ctt gat gcc ctg etc gca aag ate aag ggg ata gcc gaa aag ata 
Thr Leu Asp Ala Leu Leu Ala Lys He Lys Gly He Ala Glu Lys He 
275 280 285 



864 
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cag gcc aac act gca agg atg ata cca gac ggg gag ttt etc tec cag 912 
Gin Ala Asn Thr Ala Arg Met lie Pro Asp Gly Glu Phe Leu Ser Gin 
290 295 300 

gac cag gtg gcc gag acg gta etc gcc etc tgc gat gac aag atg gcc 960 
Asp Gin Val Ala Glu Thr Val Leu Ala Leu Cys Asp Asp Lys Met Ala 
305 310 315 320 

aag acg gta aac ggc cgc gta ate ccc gcc gac agg gta ttc tac ccg 1008 
Lys Thr Val Asn Gly Arg Val lie Pro Ala Asp Arg Val Phe Tyr Pro 
325 330 335 

gta agg gcg cat gtg gcc aat gcc get ccg cgc gtg ccc ccg cac gac 1056 
Val Arg Ala His Val Ala Asn Ala Ala Pro Arg Val Pro Pro His Asp 
340 345 350 

tat tec ggg gga tgc gtc eta ttc atg ata gat gca gca gac gac agg 1104 
Tyr Ser Gly Gly Cys Val Leu Phe Met lie Asp Ala Ala Asp Asp Arg 
355 360 365 

gat gta gaa agg gcg acc gcc ctg gca tec cat gtg gaa age cac ggg 1152 
Asp Val Glu Arg Ala Thr Ala Leu Ala Ser His Val Glu Ser His Gly 
370 375 380 

ggc acg gca gtc tgc ata gtc tea gaa gac teg ccc cgc gcg gca aag 1200 
Gly Thr Ala Val Cys lie Val Ser Glu Asp Ser Pro Arg Ala Ala Lys 
385 390 395 400 

gag atg ata gcg tea aag ttc cac teg cat gcg age cac ata gac aag 1248 
Glu Met lie Ala Ser Lys Phe His Ser His Ala Ser His He Asp Lys 
405 410 415 

gta gac gag ata aac agg tgg ctg age get gca tea aca aag ata ggc 1296 
Val Asp Glu He Asn Arg Trp Leu Ser Ala Ala Ser Thr Lys He Gly 
420 425 430 

ccc ata tct gca gtg gtc cac ctg tec ggc agg atg cca aaa tec ggc 1344 
Pro He Ser Ala Val Val His Leu Ser Gly Arg Met Pro Lys Ser Gly 
435 440 445 

age eta atg gat etc tec aga aaa gaa tgg gac gcg ctg gtt gac agg 1392 
Ser Leu Met Asp Leu Ser Arg Lys Glu Trp Asp Ala Leu Val Asp Arg 
450 455 460 

ttc ata ggg acg ccg get gcc gtc ctg cac agg teg ctt gag cac ttt 1440 
Phe He Gly Thr Pro Ala Ala Val Leu His Arg Ser Leu Glu His Phe 
465 470 475 480 

gca ccc ggc ggg cgc aag gac ccc cgt ttg ttc aag ggc aag age ggc 1488 
Ala Pro Gly Gly Arg Lys Asp Pro Arg Leu Phe Lys Gly Lys Ser Gly 
485 490 495 

gtc ate gtg ata ata ggc ccc gac ctg ccc gcg ggg aaa aag gcc tec 1536 
Val He Val He He Gly Pro Asp Leu Pro Ala Gly Lys Lys Ala Ser 
500 505 510 



ggc gcc gag agg gca agg gcg gag ate ttc egg ggt gcg etc agg ccg 



1584 
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Gly Ala Glu Arg Ala Arg Ala Glu lie Phe Arg Gly Ala Leu Arg Pro 
515 520 525 

ctg acg act aca gtc aac cag gag etc age gat gtg eta aag tea aac 1632 
Leu Thr Thr Thr Val Asn Gin Glu Leu Ser Asp Val Leu Lys Ser Asn 
530 535 540 

gtg cgc ctg ttt ace ate ctt ccc ggc agg gcg gac ggg ggc gag ace 1680 
Val Arg Leu Phe Thr He Leu Pro Gly Arg Ala Asp Gly Gly Glu Thr 
545 550 555 560 

gat gat tec cgc ata tct get gca ate gac tac ttt ctg ace ccc gag 1728 
Asp Asp Ser Arg He Ser Ala Ala He Asp Tyr Phe Leu Thr Pro Glu 
565 570 575 

get gtc teg tec ggc gag gtc ata ttc tgc gta gac gag aac agg ggc 1776 
Ala Val Ser Ser Gly Glu Val He Phe Cys Val Asp Glu Asn Arg Gly 
580 585 590 

tag 17 79 

+ 

<210> 80 
<211> 592 
<212> PRT 

<213> Cenarchaeum symbiosum 
<400> 80 

Met Lys Leu Gin Gly Lys Thr Ala Val He Thr Gly Ser Gly Thr Gly 

15 10 15 

He Gly Leu Ala Val Ala Arg Lys Phe Ala Glu Asn Gly Ala Ser Val 

20 25 30 

Val He Leu Gly Arg Arg Lys Glu Pro Leu Asp Glu Ala Ala Ala Glu 

35 40 45 

Leu Lys Lys He Ala Glu Ser Ala Gly Cys Gly Ala Ser He Arg He 

50 55 60 

Phe Ala Gly Val Asp Val Ala Asp Glu Ser Ala He Thr Lys Met Phe 
65 70 75 80 

Asp Glu Leu Ser Ser Ser Gly Val Thr Val Asp He Leu Val Asn Asn 

85 90 95 

Ala Gly Val Ser Gly Pro Val Thr Cys Phe Ala Asn Asn Asp Leu Glu 

100 105 110 

Glu Phe Arg Gly Ala Val Asp He His Leu Thr Gly Ser Phe Trp Thr 

115 120 125 

Ser Arg Glu Ala Leu Lys Val Met Lys Lys Gly Ser Lys He Val Thr 

130 135 140 

Met Thr Thr Phe Phe Ala Glu Glu Arg Pro Leu Glu Gin Arg Pro Tyr 
145 150 155 160 

Arg Phe Arg Asp Pro Tyr Thr Thr Ala Gin Gly Ala Lys Asn Arg Leu 

165 170 175 

Ala Glu Ala Met Ser Trp Asp Leu Leu Asp Arg Gly He Thr Ser He 

180 185 190 

Ala Thr Asn Pro Gly Pro Val His Ser Asp Arg He Tyr Lys Thr Val 

195 200 205 

Tyr Pro Arg Ala Ala Leu Glu Phe Val Arg Val Ser Gly Phe Glu Asp 

210 215 220 

Leu Gin Pro Glu Glu Val Glu Val Ala Gly Gly Arg Leu He His Leu 
225 230 235 240 
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Leu Gly Ala Asp Asp Asp Ala Arg Lys Lys Gly lie Ala Glu Ala Ala 

245 250 255 

Glu His Fhe Ala Lys Leu Lys Pro Val Asp Pro Ala Lys Leu Glu Ala 

260 265 270 

Thr Leu Asp Ala Leu Leu Ala Lys lie Lys Gly lie Ala Glu Lys lie 

275 280 285 

Gin Ala Asn Thr Ala Arg Met lie Pro Asp Gly Glu Phe Leu Ser Gin 

290 295 300 

Asp Gin Val Ala Glu Thr Val Leu Ala Leu Cys Asp Asp Lys Met Ala 
305 310 315 320 

Lys Thr Val Asn Gly Arg Val lie Pro Ala Asp Arg Val Phe Tyr Pro 

325 330 335 

Val Arg Ala His Val Ala Asn Ala Ala Pro Arg Val Pro Pro His Asp 

340 345 350 

Tyr Ser Gly Gly Cys Val Leu Phe Met lie Asp Ala Ala Asp Asp Arg 

355 360 365 

Asp Val Glu Arg Ala Thr Ala Leu Ala Ser His Val Glu Ser His Gly 

370 375 380 

Gly Thr Ala Val Cys lie Val Ser Glu Asp Ser Pro Arg Ala Ala Lys 
385 390 395 400 

Glu Met lie Ala Ser Lys Phe His Ser His Ala Ser His lie Asp Lys 

405 410 415 

Val Asp Glu lie Asn Arg Trp Leu Ser Ala Ala Ser Thr Lys lie Gly 

420 425 430 

Pro lie Ser Ala Val Val His Leu Ser Gly Arg Met Pro Lys Ser Gly 

435 440 445 

Ser Leu Met Asp Leu Ser Arg Lys Glu Trp Asp Ala Leu Val Asp Arg 

450 455 460 

Phe lie Gly Thr Pro Ala Ala Val Leu His Arg Ser Leu Glu His Phe 
465 470 475 480 

Ala Pro Gly Gly Arg Lys Asp Pro Arg Leu Phe Lys Gly Lys Ser Gly 

485 490 495 

Val lie Val lie lie Gly Pro Asp Leu Pro Ala Gly Lys Lys Ala Ser 

500 505 510 

Gly Ala Glu Arg Ala Arg Ala Glu He Phe Arg Gly Ala Leu Arg Pro 

515 520 525 

Leu Thr Thr Thr Val Asn Gin Glu Leu Ser Asp Val Leu Lys Ser Asn 

530 535 540 

Val Arg Leu Phe Thr He Leu Pro Gly Arg Ala Asp Gly Gly Glu Thr 
545 550 555 560 

Asp Asp Ser Arg He Ser Ala Ala He Asp Tyr Phe Leu Thr Pro Glu 

565 570 575 

Ala Val Ser Ser Gly Glu Val He Phe Cys Val Asp Glu Asn Arg Gly 
580 585 590 

<210> 81 
<211> 40 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 81 

aagctagact tttaattggg atccggcggg gcggcgcatg 40 



<210> 82 
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<211> 40 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<22l> TATA_signal 
<222> (11) . . . (16) 

<400> 82 

aagctaaact tttaattggg atccggcgag ccggcgcgtg 40 

<210> 83 
<211> 41 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA__signal 
<222> (11) . . . (16) 

<400> 83 

ggaaactttg attatacggg cgtgctgccc cggggcccat g 41 

<210> 84 
<211> 41 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<:221> TATA_signal 
<222> (11)... (16) 

<400> 84 

ggaaactttg attatacggg cgtacattcc cggggcccat g 41 

<210> 85 
<211> 42 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 85 

aaggcaaggt aataatagcc tgccgtctgt aacggccgta tg 42 

<210> 86 
<211> 42 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 



<400> 86 

acggcaaggt aataatagcc tgccgtccgt acctgccgta tg 



42 
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<210> 87 
<211> 42 
<212> DNA 

<2 13 > Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 87 

catggaacta gatattaacc ggttccgcgg atcccatgca tg 42 

<210> 88 
<211> 42 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 88 

catggaacta gataataacc ggtcccgcgg gtacaatgca tg 42 

<210> 89 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<22l> TATA_signal 
<222> (11) . . . (16) 

<400> 89 

ataccgagaa gttatagcag ggtatggaat gtgcgcgcgc atg 43 

<210> 90 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 90 

agcacgacaa gttatagcag ggtacaaagg agcagcgcac atg 43 

<210> 91 

<211> 43 

<212> DNA 

<213> Cenarcheaum symbiosum 
<220> 

<221> TATA_signal 

<222> (11) . . . (16) 



WO 00/18909 



-141- 

<400> 91 

atccgccctg attaaattat ggggggagcg gcctgctgcc gtg 

<210> 92 
<211> 43 
<212> DNA 

<213> Cenarchaeura symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 92 

atccggcctc attaaattac ggggggtaca acctgctgcc gtg 

<210> 93 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 93 

ccttcataca cataaatccc gcttggatgt gcggctgcgc atg 

<210> 94 
<2ll> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 94 

acttcataca cataaatccc gcctgaacgg tcgtccgcgc atg 

<210> 95 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (10) . . . (15) 

<400> 95 

ggcatatacc ataatatgcc gggcggtggc accatggccg ttg 

<210> 96 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
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<222> (11) . . . (16) 
<400> 96 

ccgcatatac cataatatgc cgggcggggg caggctgccc gtg 

<210> 97 

<211> 44 

<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 

<222> (11) . . . (16) 

<400> 97 

tgtacgaaac cataaaacaa caggccgcgt cagggccgcg cgtg 

<210> 98 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 98 

gggtagaaac cataaaacaa caggccgcgg cagggcgcgc gtg 

<210> 99 

<211> 42 

<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 

<222> (9)... (14) 

<400> 99 

acacgcagta taaacggggg cccgggcggc gcgtatcaca tg 

<210> 100 
<211> 43 
<212> DNA 

<213> Cenarchaeum symbiosum 

<220> 

<22l> TATA_signal 
<222> (11) . . . (16) 

<400> 100 

atacacgtgg tataaacaga ggccggacgg cgcggaccac atg 

<210> 101 
<211> 44 
<212> DNA 

<213> Cenarchaeum symbiosum 
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<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 101 

gcgatagtta tttaaaacta ggatgccgat cacggatcgt ccca 

<210> 102 

<211> 44 

<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATAsignal 

<222> (11) . . . (16) 

<400> 102 

gcgatagtta tttaaaacta ggatgccggg cacccgtcgt ccca 

<210> 103 
<211> 44 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 103 

ccgggccccg gttaaaatag cgcacgggcg gatcctgacc aatg 

<210> 104 

<211> 45 

<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 

<222> (11) . . . (16) 

<400> 104 

ccgggccccg gttaaaatag agtgcggccg ggcaccggat caatg 

<210> 105 
<211> 51 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (161 

<400> 105 

gcgtcgatag aataaatacg cgcagggggc cccgtggcgc gatcgcccgt 

<210> 106 
c211> 47 
<212> DNA 
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<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 106 

gcgtcgatag aataaatacg cgcggggccg cggtgcgatc gcccgtg 4 7 

<210> 107 
<211> 60 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 107 

atttcaacta cataaatgcc tagttacgca gaaatagcaa acgacgtact tcgactaatg 60 

<210> 108 
<211> 60 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 108 

acttcaacta cataaatgcc tagctacgca gaaatatcaa acaaagtact tcgactaatg 60 

<210> 109 
<211> 67 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 109 

acggcaggct attattacct tgccttgcgt tgtatagtat gccttatgcg gggtgcggca 60 

ggggatg 67 

<210> 110 
<211> 66 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 



<400> 110 

acggcaggct attattacct tgccgtgtgt acagggcatg ccggatgagg gggcctgccg 60 
ggagtg 66 
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<210> 111 
<211> 121 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 111 

ctacaacgat tttaagtcgg cgccggggca gccgcataga atgtgtatga cccgtaggat 60 

cgcgcggccc gcctgctgcg cagatctgtc cgtccagcct gatgtggggc aggcaacatg 120 

a 121 

<210> 112 
<211> 98 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 112 

ctacaaagat tttaagacgg cgcgggtgcc gcggtacaag atgaatacga cttgtcggat 60 
cgcgcagggg cagatggatg gcacgggggc ctatcttg 98 

<210> 113 
<211> 236 
<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 
<222> (11) . . . (16) 

<400> 113 

tcggcgatgg tttatatgcc catggacggg ccgatccgat cgtacgtgac gcaagagcgg 60 

cgcttgcgat gaatgcatgg tattgtacca tattgtgatt cgctggcctc cagttacgca 120 

cacagaatga gggtatgatc gaagggtcat atctgagatg tgaagattat gtgcattctg 180 

ttcaattcca aaagtacaag cgtacttaac aaaaaaaaaa taatccaatt atgaat 236 

<210> 114 

<211> 235 

<212> DNA 

<213> Cenarchaeum symbiosum 
<220> 

<221> TATA_signal 

<222> (11)... (16) 

<400> 114 

ccggcgatgg tttatatgcc catggacaag gcgatccgat cgtacgtgac gcaagagcgg 60 

cgcttgcgat gaagccatgg tattgtacca ttttgtgatt cgcaggcctc cagttacgca 120 

cacagaatga ggatctgatc gaagggtcat atctgagatg tgaagattat gtgcattccg 180 

ttcaattcca aaagtacagg cgtactttga aaaaaaaaat aatccaaata agaat 235 
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<210> 115 
<211> 20 
<212> DMA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 115 

gtgctccccc gccaattcct 20 

<210> 116 
<211> 15 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 116 

ctttccctca cggta 15 

<210> 117 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 117 

ctattgccgt ctttacacc 19 

<210> 118 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 118 

gaatccgccc ccgactatct t 21 

<210> 119 
<211> 18 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 
<400> 119 

catggcttag tatcaatc IB 

<210> 120 
<211> 23 
<212> DNA 
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<213> Artificial Sequence 
<220> 

<223> oligonucleotide 

<221> modified_base 
<222> (3) ...O) 
<223> I 

<221> modified J)aBe 
<222> (12) ..-(12) 
<223> I 

23 

<40O> 120 
acntacaacg gngacgaytt tga 

<210> 121 
<211> 21 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 

21 

<400> 121 



caccccgaar 



tagttyttyt t 



<210> 122 
<2H> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 

19 

<400> 122 
acacttcaac tatttcctg 

<210> 123 
<211> 19 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> Oligonucleotide 

19 

<40Q> 123 



acactttgac 



tatttcgtg 



